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Andreas Fickers, Juliane Tatarinov, Tim van der Heijden 
Digital history and hermeneutics — 
between theory and practice: 

An introduction 


Introduction 


In a 2020 special issue of the journal Digital Humanities Quarterly, Urszula Paw- 
licka-Deger proclaims a “laboratory turn” within the field of digital humanities, 
representing a paradigm shift in humanities research infrastructure in both Eu- 
rope and the United States.’ She locates this turn within discourses of knowledge 
production in academia and emphasizes a “shift from a laboratory as a physical 
location to conceptual laboratory.”” This shift, she argues, implies certain values 
and a new way of thinking and communicating, mirrored in research and train- 
ing programs. This volume aims to situate itself in the current debate on the so- 
called laboratory turn of digital humanities by offering experience-based insights 
into the learnings and failures, intellectual gains and conceptual struggles, and 
practical challenges and opportunities of a laboratory-like training environment: 
the Doctoral Training Unit “Digital History and Hermeneutics” (DTU-DHH), affili- 
ated to the Luxembourg Centre for Contemporary and Digital History (C’DH) at 
the University of Luxembourg.’ The contributions to this volume reflect on the 
methodological and epistemological challenges and tensions that this DTU faced 
as a four-year interdisciplinary research program. As a laboratory setting, the 
DTU created an interdisciplinary home base for researchers from various episte- 
mic cultures and disciplinary traditions. Framed by the concept of digital herme- 
neutics, the chapters offer a broad portfolio of reflexive approaches to the field 
of digital history, combining the individual research experiences of PhD students 
with more general reflections on the validity and heuristic potential of central 
concepts and methods in the field of digital humanities. 


1 Urszula Pawlicka-Deger, “The Laboratory Turn: Exploring Discourses, Landscapes, and 
Models of Humanities Labs,” Digital Humanities Quarterly 14, no. 3 (2020). 

2 Pawlicka-Deger, “The Laboratory Turn,” paragraph 2. 

3 Financed within the PRIDE scheme of the Luxembourg National Research Fund (FNR) and 
supported by the University of Luxembourg, the Doctoral Training Unit “Digital History and 
Hermeneutics” provided an experimental training and research environment for 13 PhD stu- 
dents, their supervisors, and a coordinating postdoctoral researcher. For more information see 
the project website: https://dhh.uni.lu, accessed December 3, 2021. 


3 Open Access. © 2022 Andreas Fickers, et al., published by De Gruyter. Iech BAAT] This work is 
licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. 
https://doi.org/10.1515/9783110723991-001 


2 — Andreas Fickers, Juliane Tatarinov, Tim van der Heijden 


The Doctoral Training Unit was based on two central concepts: the concept 
of trading zone and the concept of digital hermeneutics. In order to reflect on 
the ongoing developments in the field of digital history — which can be seen as 
a specific area within the broader field of digital humanities — the DTU was 
conceived as a space of experimentation where different epistemic cultures, 
disciplinary traditions and communities of practice would mangle and new 
forms of knowledge in the making would be negotiated.* As the members of 
the DTU consisted of historians, philosophers, computer scientists, geogra- 
phers, information scientists, and experts on human-computer interaction, col- 
laborating in this interdisciplinary setting meant interacting in an intellectual 
climate characterized by experimentation, creative uncertainty, and appropria- 
tion of new tools and methodologies for doing digital history research. Framing 
the DTU in sociological terms as a “trading zone” in which different communi- 
ties of practice interact, the unit was designed as a collaborative space of 
knowledge production in which methodological interdisciplinarity and theoret- 
ical bricolage formed the mental framework for critical debate and discussion. 
Inevitably, this asked for serious intellectual and communicative investments 
by all partners involved, including supervisors and external experts, as well as 
the doctoral students. 

In this sense, the DTU approached digital history as what Julie Thompson 
Klein refers to as “deep interdisciplinarity”:” a modus of collaboration that can 
alter disciplinary practices and create new hybrid languages. But how can one 
constitute and operate such an interdisciplinary trading zone in practice? How 
can one design such a collaborative space within the existing structures of a 
university environment?® In contrast to similar interdisciplinary setups which 
generally share a topical or methodological focus, the themes and approaches 
within the DTU-DHH framework were very broad, reflecting the wide range of 
research questions and methodological designs of the individual research proj- 
ects. This diversity of topics and approaches was mirrored by the broad range 
of sources and data to be studied: these ranged from textual data (corpora of 


4 On the concept of “mangle”, see: Andrew Pickering, “The Mangle of Practice: Agency and 
Emergence in the Sociology of Science,” American Journal of Sociology 99, no. 3 (1993): 559-89. 
5 Julie Thompson Klein, Interdisciplining Digital Humanities: Boundary Work in an Emerging 
Field (Ann Arbor: University of Michigan Press, 2015), 142. 

6 For a discussion of the role of digital humanities centres in the facilitation of interdisciplin- 
ary knowledge see: Mila Oiva, “The Chili and Honey of Digital Humanities Research: The Facil- 
itation of the Interdisciplinary Transfer of Knowledge in Digital Humanities Centers,” Digital 
Humanities Quarterly 14, no. 3 (2020). On C’DH’s establishment at the University of Luxem- 
bourg see: Max Kemman, Trading Zones of Digital History (Oldenbourg: De Gruyter, 2021), 
69-81. 
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nineteenth century psychiatric journals, twentieth century Indigenous Austra- 
lian autobiographies, transcripts of US presidential television debates), oral tes- 
timonies (toponymies, oral interviews), pictures (photographs, early modern 
constcamer paintings), material objects (computers, museum objects), archaeo- 
logical data (Roman inscriptions, excavations of Stone Age settlements) to com- 
puter models (historical networks, agent-based models). All of the resulting 
datasets were used to test assumptions, to question existing field knowledge, 
and to develop new layers of interpretative framing. Inspired by the call of Fred 
Gibbs and Trevor Owens to “publicly experiment with ways of writing about 
their methodologies, procedures, and experiences with historical data as a kind 
of text,”” we encouraged our PhD students to reflect on the “usage” of historical 
data not simply as evidence and “self-identical”® but from multiple viewpoints 
and based on the principles of digital hermeneutics. 


Building a trading zone 


The DTU was designed and conceptualized as an interdisciplinary trading zone 
within the field of digital history.” We define a trading zone as an intellectual 
space and social place for knowledge transfer and exchange between different 
knowledge domains and their “communities of practice”: groups of people who 
collectively engage in shared learning activities and base their group identity on a 
shared craft, domain and practice.’° Translated to the field of digital history, the 
concept seems useful for studying and analyzing how different communities of 
practice interact and negotiate within an interdisciplinary setting. In Trading 
Zones of Digital History, Max Kemman describes digital history as a trading zone 
between the “two cultures” of humanities and computational research." In this 


7 Frederick W. Gibbs and Trevor J. Owens, “Hermeneutics of Data and Historical Writing (Fall 
2011 Version),” in Writing History in the Digital Age, ed. Jack Dougherty and Kristen Nawrotzki, 
2011. 

8 Johanna Drucker, “Humanistic Theory and Digital Scholarship,” in Debates in the Digital 
Humanities, ed. Matthew K. Gold (Minneapolis: University of Minnesota Press, 2012), 85-95. 

9 See for a detailed reflection on the DTU as interdisciplinary digital history trading zone: An- 
dreas Fickers and Tim van der Heijden, “Inside the Trading Zone: Thinkering in a Digital His- 
tory Lab,” Digital Humanities Quarterly 14, no. 3 (2020). 

10 On situated practices in the field of digital humanities, see the special issue “Lab and 
Slack” of the journal Digital Humanities Quarterly vol. 14, no. 3 (2020). 

11 Kemman, Trading Zones of Digital History, 40. Cf. C.P. Snow, The Two Cultures and the Sci- 
entific Revolution (Cambridge: Cambridge University Press, 1959). 
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trading zone, Kemman argues, both historians and computer or data scientists are 
mutually involved in developing new research questions, designing methodologi- 
cal approaches and experimenting with new research practices. While historians 
collaborate with computational experts aiming at adjusting digital tools and 
methods in order to produce new or alternative interpretations of the past, 
computational experts are driven by a problem-solving approach, testing how 
computational methods and techniques can help to make sense of heterogeneous, 
imperfect, and often incomprehensive data collections." As such, the trading 
zone has proven to be a useful heuristic concept for the analysis of sociocultural 
interactions, conceptual negotiations, and interactional practices that have 
emerged during the lifetime of the DTU. 


Three aspects of trading zones 


Based on our experiences with running the DTU-DHH, three elements of the unit 
as a trading zone are important to emphasize: (1) locality, (2) interdisciplinarity, 
and (3) the establishment of a common ground and shared language.” Historian 
of science Peter Galison defined a trading zone as “an arena in which radically 
different activities could be locally, but not globally, coordinated.”'* This defini- 
tion of the trading zone concept emphasizes the role of locality and the importance 
of a collaborative space to facilitate interactions between different communities of 
practice. In the design of the DTU, the aspect of locality played an important role. 
Instead of working in different offices and departments, the PhD students were of- 
fered one shared office space: the so-called “open space.” Apart from having a 
shared office space, the group frequently interacted in other localities of the CD. 
most importantly the Digital History Lab where the DTU skills trainings and re- 
search seminars took place. 

Besides locality, interdisciplinarity is a central characteristic of a trading zone: 
the transfer and exchange of concepts, methods, tools, techniques and skills be- 
tween or across different disciplinary fields or knowledge domains. Since digital 
historians have been using research methods and tools from the computer scien- 
ces and other knowledge domains such as geographical information systems, 


12 Kemman, Trading Zones of Digital History, 3. 

13 For a more detailed analysis of these three aspects of digital history trading zones, see: 
Fickers and van der Heijden, “Inside the Trading Zone.” 

14 Peter Galison, “Computer Simulations and the Trading Zone,” in The Disunity of Science: 
Boundaries, Contexts, and Power, ed. Peter Galison and David J. Stump (Stanford, California: 
Stanford University Press, 1996), 119. Original emphasis. 
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human-computer interaction, computational linguistics, and network analysis, 
digital history can be understood as an interdisciplinary field by definition. At the 
same time, some of the long-standing “epistemic differences”? between histori- 
ans, computer scientists, and other disciplines continue to exist. While computer 
scientists, for instance, make use of quantitative methods and computational 
models to produce scientific evidence and to “explain” or “simulate” the world, 
historians mostly deploy qualitative and hermeneutic methods in trying to “un- 
derstand” the complexities of past realities.’ These different scientific traditions — 
despite the shared use of digital infrastructures, data, and tools — continue to 
have a strong resonance when it comes to the epistemological and methodologi- 
cal foundations of disciplines and the self-understandings of researchers within 
those communities of practice. Differences in research design and methodology 
(quantitative versus qualitative), approach (i.e. machine-based “distant reading” 
versus individual “close reading” of text corpora), and ambitions (to find general 
scientific laws versus the production of original subjective interpretations in the 
humanities) created challenging “boundary objects””” in our trading zone. 

The aim of the DTU was to overcome such epistemic differences by establish- 
ing a common ground. As interactional expertise is based on successful communi- 
cation, a shared vocabulary is a crucial element in all interdisciplinary research. 
After all, certain terms and concepts can mean different things to different schol- 
ars or communities of practice. Whereas historians speak about “sources,” librar- 
ians and archivists talk about “documents,” and computer scientists refer to 
“data.” Such terms and concepts are typical boundary objects, which have to be 
negotiated in order to enable a shared understanding. Whether such a common 
vocabulary or language really emerges, however, depends very much on the type 
of trading zone one is interacting with. In their article “Trading Zones and Inter- 
actional Expertise,” Collins, Evans and Gorman distinguish between four types 
of trading zones: inter-language, subversive, enforced, and fractionated.’ 


15 Karin Knorr Cetina, Epistemic Cultures: How the Sciences Make Knowledge (Cambridge, 
Mass.: Harvard University Press, 1999). 

16 Andreas Fickers, “Veins Filled with the Diluted Sap of Rationality: A Critical Reply to Rens 
Bod,” BMGN - Low Countries Historical Review 128, no. 4 (2013). 

17 Susan Leigh Star and James R. Griesemer, “Institutional Ecology, ‘Translations’ and Boundary 
Objects: Amateurs and Professionals in Berkeley’s Museum of Vertebrate Zoology, 1907-39,” 
Social Studies of Science 19, no. 3 (1989): 387-420; Pascale Trompette and Dominique Vinck, 
“Revisiting the notion of Boundary Object,” Revue d’anthropologie des connaissances 3, no. 1 
(2009): 3-25. 

18 Harry Collins, Robert Evans, and Mike Gorman, “Trading Zones and Interactional Exper- 
tise,” Studies in History and Philosophy of Science Part A 38, no. 4 (December 2007): 657-66. 
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According to the sociologists of knowledge, the type of trading zone depends 
on whether a group is homogeneous or heterogeneous, and whether the “trad- 
ing” or group dynamics are based on collaboration or coercion. They argue that 
“inter-language trading zones” may only develop in groups with strong collabo- 
ration and high homogeneity — as opposed to enforced trading zones, which are 
characterized by high heterogeneity and high coercion. The DTU has been char- 
acterized by such high heterogeneity since the beginning of the project, given 
the groups’ diverse mix of disciplinary backgrounds, ages, and nationalities. 
Being familiar with the work of Julie Thompson Klein, we were cautioned that, 
although the heterogeneity of our DTU could potentially generate highly innova- 
tive outputs, it could also turn into a source of conflict.” By means of the so- 
called “digital humanities incubation phase,” we aimed to establish a common 
ground and shared language in order to stimulate interdisciplinary exchanges 
and collaborations within the project team, and so to transform the DTU into an 
inter-language trading zone in digital history. 


Digital hermeneutics as critical framework 
and research agenda 


While the concept of a trading zone is helpful in gaining a better insight into 
the complexity of interdisciplinary research practices, with their multi-layered 
challenges, on a theoretical as well as a practical level, the DTU aimed at mak- 
ing these challenges explicit — and objects of critical reflection by all partici- 
pants. Nowadays, all stages of realizing a digital history project are to a lesser 
or greater degree shaped by the use of digital infrastructures and tools. Be it 
browsing on the Internet, taking notes of an interview on a laptop, capturing 
digital photographs in archives or museum collections, recording an oral testi- 
mony on a mobile phone, or organizing crowdsourcing activities on the Web, 
the workflow of historical research is characterized by digital interventions.”° 
We use “digital hermeneutics” as a concept that enables historians to critically 
reflect on the various interventions of digital research infrastructures, tools, 


19 Klein, Interdisciplining Digital Humanities, 138. 

20 On the notion of “digital intervention” in doing public history, see: Anita Lucchesi, “For a 
New Hermeneutics of Practice in Digital Public History: Thinkering with memorecord.uni.lu” 
(PhD dissertation, University of Luxembourg, 2020). 
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databases, and dissemination platforms in the process of thinking, doing and 
narrating history.” 

Although one can argue that all historians have by now become digita 
one has to emphasize the fact that many remain strongly embedded in analog 
practices and traditions. This current duality or parallelism of analog and digital 
practices forces historians to experiment with the new while keeping established 
norms of valid historical practices alive. If we accept that “hybridity is the new 
normal,””? we need an update of historical hermeneutics problematizing the “in- 
betweenness” of current history practices.” Instead of falling into the trap of 
asymmetric conceptions (“analog” versus “digital”), the concept of digital her- 
meneutics proposes a critical framework for making the methodological and 
epistemological tensions in current history practices explicit.” Making the “in- 
terventions” of the digital into historical practices explicit first of all asks for a 
critical engagement with digital infrastructures, data, and tools — a hands-on ap- 
proach that combines playful tinkering with critical thinking. This idea of “thin- 
kering” as a heuristic mode of doing has informed both the individual work of 
PhD students and the organization of collective skills training and hands-on re- 
search seminars within the DTU. As the many reflexive blog entries under the 
“thinkering” label on the C?DH website” and DTU website” demonstrate, the 


2 


21 On the idea of digital hermeneutics see: Manfred Thaller, “The Need for a Theory of Histori- 
cal Computing,” Historical Social Research/Historische Sozialforschung, no. 29 (1991): 193-202; 
Joris J. van Zundert, “Screwmeneutics and Hermenumericals: The Computationality of Herme- 
neutics,” in A New Companion to Digital Humanities, ed. Susan Schreibman, Ray Siemens, and 
John Unsworth (London: Wiley-Blackwell, 2016), 331-47; Stephen Ramsay, “The Hermeneutics 
of Screwing Around; or What You Do with a Million Books,” in Pastplay: Teaching and Learn- 
ing History with Technology, ed. Kevin Kee (Ann Arbor: University of Michigan Press, 2014), 
111-20. From a philosophical perspective, see: Alberto Romele, Digital Hermeneutics: Philo- 
sophical Investigations in New Media and Technologies (New York: Routledge, 2020). 

22 See: Daniel J. Cohen and Roy Rosenzweig, Digital History: A Guide to Gathering, Preserving, 
and Presenting the Past on the Web (Philadelphia: University of Pennsylvania Press, 2006). 

23 Gerben Zaagsma, “On Digital History,” BMGN - Low Countries Historical Review 128, no. 4 
(December 16, 2013): 3-29. 

24 Andreas Fickers, “Update fiir die Hermeneutik. Geschichtswissenschaft auf dem Weg zur 
digitalen Forensik?,” Zeithistorische Forschungen/Studies in Contemporary History 17, no. 1 
(2020): 157-68. 

25 Reinhart Koselleck, “Zur historisch-politischen Semantik asymmetrischer Gegenbegriffe,” 
in Vergangene Zukunft: zur Semantik geschichtlicher Zeiten (Frankfurt a.M: Suhrkamp, 1989), 
211-59. 

26 Luxembourg Centre for Contemporary and Digital History, https://c2dh.uni.lu/thinkering, 
accessed December 3, 2021. 

27 Doctoral Training Unit “Digital History and Hermeneutics”, https://dhh.uni.lu/category/ 
blog/, accessed December 3, 2021. 


8 — Andreas Fickers, Juliane Tatarinov, Tim van der Heijden 


concept of digital hermeneutics has been instrumental in critically reflecting on 
how digital tools and infrastructures are transforming historical research practi- 
ces in all stages of the iterative research process. As a comprehensive framework 
of epistemological and methodological investigation, it invites us to approach 
the historical research practices of search, data management and curation, anal- 
ysis and visualization, interpretation and publication, by: 

- opening the black boxes of algorithm-driven search engines and reflecting 
on the heuristics of search in online catalogs and repositories’? 

- thinking about the six Vs of data integrity (volume, velocity, variety, valid- 
ity, veracity, value) and training us in historical data criticism”? 

- understanding and critically reflecting on how digital tools co-create the 
epistemic objects of study and turn the user into a manipulator of highly 
specific research instruments” 

— deconstructing the “look of certainty” of data visualization by exploring 
the indexical relationship between the “back end” and “front end” of dy- 
namic interfaces?” 

- developing multimodal literacy in order to decode narrative conventions of 
transmedia storytelling and the relational logic of web-applications and ar- 
chives when interpreting and publishing historical data.” 


28 David Gugerli, Suchmaschinen: die Welt als Datenbank (Frankfurt a.M: Suhrkamp, 2009); 
Ronald E. Day, Indexing It All: The Subject in the Age of Documentation, Information, and Data 
(Cambridge, Mass.: MIT Press, 2014); Jessica Hurley, “Aesthetics and the Infrastructural Turn 
in the Digital Humanities,” American Literature 88, no. 3 (September 2016): 627-37. 

29 Carl Lagoze, “Big Data, Data Integrity, and the Fracturing of the Control Zone,” Big Data & 
Society 1, no. 2 (July 10, 2014): 1-11; Bruno J. Strasser and Paul N. Edwards, “Big Data Is the 
Answer. . . But What Is the Question?,” Osiris 32, no. 1 (2017): 328-45. 

30 Marijn Koolen, Jasmijn van Gorp, and Jacco van Ossenbruggen, “Toward a Model for Digi- 
tal Tool Criticism: Reflection as Integrative Practice,” Digital Scholarship in the Humanities 34, 
no. 2 (June 1, 2019): 368-85; Karin van Es, Maranke Wieringa, and Mirko Tobias Schäfer, “Tool 
Criticism and the Computational Turn: A ‘Methodological Moment’ in Media and Communica- 
tion Studies,” M&K Medien & Kommunikationswissenschaft 69, no. 1 (2021): 46-64. 

31 Johanna Drucker, “Performative Materiality and Theoretical Approaches to Interface,” Digi- 
tal Humanities Quarterly 7, no. 1 (2013); David M. Berry, Critical Theory and the Digital 
(New York: Bloomsbury, 2014); Alexander R. Galloway, The Interface Effect (London: Polity, 
2012); Johanna Drucker, Visualization and Interpretation: Humanistic Approaches to Display 
(Cambridge, Mass.: MIT Press, 2020). 

32 Steve F. Anderson, Technologies of History: Visual Media and the Eccentricity of the Past, 
Interfaces, Studies in Visual Culture (Hanover, NH: Dartmouth College Press, 2011); Niels Briig- 
ger, The Archived Web: Doing History in the Digital Age (Cambridge, Mass.: MIT Press, 2018); 
Tracey Bowen and Carl Whithaus, eds., Multimodal Literacies and Emerging Genres (Pitts- 
burgh, Pennsylvania: University of Pittsburgh Press, 2013). 
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As mentioned earlier, the original idea of the DTU was to reflect on the multiple 
interferences of digital infrastructures and tools on the “classical” research 
flow of historical research — encompassing the search for sources, the data 
management and curation, the analysis and visualization, and finally the her- 
meneutic interpretation and storytelling. For this, we argued, new critical skills 
are necessary: algorithm criticism, digital source criticism, tool criticism, inter- 
face criticism, and simulation criticism. All these digital skills and competences 
should be part of the toolkit of digital historians, symbolizing the “reflexive 
turn” in digital humanities.” 

Whereas the plasticity of the linear structure of a research process compris- 
ing clearly defined steps" provided a good starting point to engage the interdis- 
ciplinary group with the concept of digital hermeneutics and to critically reflect 
on this process in practice, it soon became apparent that all stages were in fact 
fluent, interconnected, and often conducted in parallel (Fig. 1). Following Ste- 
phen Ramsay and Joris van Zundert one could stress that “the screwing around 
with data" to test tools and methods during the research process implies that 
“our methodologies might not be as deliberate or as linear as they have been in 
the past.”*° Depending on how the research question is approached and modi- 
fied over time, new searches for data have to be made, new tools to be tested, 
datasets to be adapted and modified, and visualizations or interpretations to 
be revised and refined. 

To summarize, digital hermeneutics as a “hermeneutics of in-betweenness 
problematizes the many tensions between the analog and the digital, browsing 
and searching, scanning and reading, sharing and engaging, and accessibility 


9937 


33 Petri Paju, Mila Ova, and Mats Fridlund, “Digital and Distant Histories. Emergent Ap- 
proaches within the New Digital History,” in Digital Histories: Emergent Approaches within the 
New Digital History, ed. Mats Fridlund, Mila Oiva, and Petri Paju (Helsinki: HUP - Helsinki 
University Press, 2020), 3-18, here p. 5; Mareike König, “Die digitale Transformation als reflex- 
iver turn: Einführende Literatur zur digitalen Geschichte im Überblick,” Neue Politische Litera- 
tur 66, no. 1 (March 2021): 37-60. 

34 See the graphical research and training design 2019 underlying the programme, published 
2020 in: Fickers, “Update für die Hermeneutik”. 

35 van Zundert, “Screwmeneutics and Hermenumericals”; Ramsay, “The Hermeneutics of 
Screwing Around”. 

36 Gibbs and Owens, “Hermeneutics of Data and Historical Writing (Fall 2011 Version).” 

37 Andreas Fickers, “Hermeneutics of In-Betweenness: Digital Public History as Hybrid Prac- 
tice,” in Handbook of Digital Public History, ed. Serge Noiret, Mark Tebeau, and Gerben 
Zaagsma (Oldenbourg: De Gruyter, forthcoming). 
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and interpretation that are inscribed into current practices of digital history.”® 
Applied digital hermeneutics is as much a “theory of practice” as a “practice of 
theory”:°? by exploring the intellectual space in between the “unknown” and 
the “familiar,” digital hermeneutics occupies exactly the space that the philoso- 
pher of knowledge Hans-Georg Gadamer had identified as the “locus” of herme- 
neutics — that is, its in-betweenness.*° 


Turning theory into practice 


It is by undertaking heads-on and hands-on experiences that both students and 
supervisors can “grasp” the methodological and epistemological challenges in- 
scribed into the practices of digital hermeneutics. The training concept of the DTU- 
DHH therefore followed the pedagogical principle of learning by doing.“' At the 
core of this approach were the nine skills trainings offered during the project’s DH 
incubation phase. These trainings introduced the PhD students to the following 
topics: text mining; digital source criticism; database structures; introduction to 
programming with Python; data visualization; tool criticism; algorithmic critique; 
GIS analysis, mapping and cartography; and experimental media ethnography. 

In retrospect, one can argue that the skills trainings at least partially suc- 
ceeded in establishing a common ground for all DTU participants, by creating a 
shared set of practical knowledge originating from different disciplinary tradi- 
tions. This stimulated a transfer of knowledge and skills across the participants 
involved and contributed to a better understanding of how students who had 
trained in different epistemic communities were able, or not, to appropriate re- 
search concepts, methods, and tools from other disciplines. The training fur- 
thermore encouraged the PhD students to critically reflect on the use of digital 
methods and tools in their own research projects. By means of lectures and 


38 On the notion of inscription and the role of the digital infrastructures, objects, and tools as 
“actants,” see: Bruno Latour, Reassembling the Social: An Introduction to Actor-Network- 
Theory (Oxford, New York: Oxford University Press, 2005). 

39 Theodore R. Schatzski, Karin Knorr Cetina, and Eike von Savigny, eds., The Practice Turn 
in Contemporary Theory (London, New York: Routledge, 2001). 

40 Hans-Georg Gadamer, Wahrheit und Methode: Grundziige einer philosophischen Hermeneu- 
tik (Tiibingen: Mohr Siebeck, 2010[1960]), 300. 

41 Jean Lave and Etienne Wenger, Situated Learning: Legitimate Peripheral Participation (Cam- 
bridge, UK: Cambridge University Press, 1991); Peter Heering and Roland Wittje, eds., Learning 
by Doing: Experiments and Instruments in the History of Science Teaching (Stuttgart: Franz 
Steiner Verlag, 2011). 
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hands-on exercises, for instance, they learned and experienced how digital 
tools (e.g. Voyant, QGIS, and Tableau) could be useful heuristic instruments for 
text analysis and data visualization, in general terms. But they simultaneously 
reflected on how these tools could potentially shape their own research practi- 
ces and interpretative frameworks. Yet the DH incubation phase did not serve 
everyone equally. Since the skills trainings came with a significant time invest- 
ment, the question of whether or not they should be compulsory or not was ex- 
tensively debated within the project team. Eventually, halfway through the 
project’s first year, we decided to no longer make the training compulsory. Once 
the courses became optional, the PhD students could choose which to follow, 
based on an assessment of the relevance to their individual research projects. 

In the second and third years of the DTU, training formats were adapted to 
the specific needs of each researcher. The PhD students were encouraged to or- 
ganize workshops discussing specific aspects of their research projects or fields. 
In addition, a lecture series hosting international guest speakers was orga- 
nized.“ These formats were designed to be initiated by the PhD researchers 
themselves, offering opportunities to meet individual training needs and broad- 
ening their academic networks. At the same time, these activities provided a 
framework for fostering the constant exchange between DTU members and an 
academic public interested in joining the lectures or workshops. An interna- 
tional masterclass involving the scientific partner institutions of the DTU gener- 
ated constructive feedback for the PhD students in their third year and initiated 
synergetic discussions within the program.” 

Unsurprisingly, establishing the DTU as a collaborative working environ- 
ment also faced several challenges. One structural problem was that all the PhD 
students had a double affiliation. As members of the DTU, they were affiliated to 
the C?DH as hosting institution, which offered them both the “open space” and 
the Digital History Lab as collaborative work spaces. In addition, the individual 
PhD students were affiliated to the faculty or department of their respective 
supervisors, where they were partly embedded into ongoing research and the 
teaching activities of their supervisors. This dual affiliation created a potential 
conflict of interest between the “DTU logic” and the “department logic.” The 
various disciplinary embeddings of the supervisors involved in the unit created 
some tensions in terms of expectations and responsibilities, which had to be 
mediated by the DTU management team. Some supervisors offered their PhD 


42 Doctoral Training Unit “Digital History and Hermeneutics”, https://dhh.uni.lu/category/ac 
tivities/lecture-series/, accessed December 3, 2021. 

43 Doctoral Training Unit “Digital History and Hermeneutics”, https://dhh.uni.lu/event/inter 
national-master-class-digital-history-and-hermeneutics/, accessed December 3, 2021. 
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students a second office in their departments, thereby creating a physical dis- 
tance between these students and the rest of the group working in the C’DH 
open space. In our view, this constituted a crucial limitation to the trading 
zone concept as it fostered an atmosphere of individual rather than collective 
working environments. It took considerable effort in terms of project manage- 
ment and leadership to redirect this tendency and refocus the DTU on gaining 
common achievements. 

The coordinating postdoctoral researcher played a crucial role in mediating 
institutional tensions, aligning the team members in terms of expectation man- 
agement, and in organizing regular team meetings and team-building activities, 
as well as in guaranteeing a constant flow of information.** Of importance for 
the governance of the unit was the creation of a management team consisting of 
the head of the DTU, two supervisor professors, the coordinating postdoctoral re- 
searcher, and one representative of the doctoral students (the latter being elected 
by the PhD students and having a non-renewable term of one year). Following 
Anna Maria Neubert, navigating these interdisciplinary differences, including in 
terms of desirable outcomes and expected results, requires the use of profes- 
sional project management tools and techniques, as well as continuous invest- 
ment in communication — both face-to-face and through digital means.” 

Being aware of the key importance of close proximity and random encoun- 
ters for creativity and team-building, the coronavirus pandemic of 2020-2021 
came as an unpleasant surprise to the project, forcing the team into a remote- 
working mode during the successive shutdowns. Luckily, the crisis hit the DTU 
in the final phase of the project, when most PhD students were focusing on 
writing their PhD dissertations and preparing their defenses. Although planned 
on-site workshops and lectures had to be canceled and new initiatives became 
nearly impossible, the team continued to discuss the progress of research proj- 
ects online and shared their experiences and the new challenges of work-life 
balance using online communication channels, such as Slack. With communica- 
tion moving entirely to online formats, the importance of physical co-location as 
a crucial element for interdisciplinary collaboration became obvious to all in a 
rather abrupt and unexpected way. Whereas the writing up of individual research 
results was possible in remote working mode - although not without problems, 
due to a lack of access to libraries and archives — it became increasingly arduous 


44 Klein, Interdisciplining Digital Humanities, 138. 

45 Anna Maria Neubert, “Navigating Disciplinary Differences in (Digital) Research Projects 
Through Project Management,” in Digital Methods in the Humanities: Challenges, Ideas, Per- 
spectives, ed. Silke Schwandt (Bielefeld: Bielefeld University Press, 2020), 59-85. 
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to keep the team spirit alive, something we had previously tried to actively pro- 
mote through team retreats and excursions. 


Organization of this book 


This volume does not aim to offer a synthesis of the multilayered research activi- 
ties that have characterized the interdisciplinary setting of the DTU. Neither does 
it argue that there are “best practices” for how to organize such collaborative set- 
tings for doctoral training. While using the concept of digital hermeneutics as 
both an epistemological and a methodological framework for the project, we em- 
brace the “interpretative flexibility” of the different disciplinary appropriations of 
the concept that we see in the individual research projects. When looking at the 
thirteen contributions by the PhD students to this volume, we observe a great va- 
riety of ways in which the concept of digital hermeneutics has shaped individual 
research practices and how it has affected the interpretation of research results. 
While some PhD theses engage with the concept in a deeper theoretical or episte- 
mological manner, others demonstrate a more pragmatic translation of methods 
and tools between disciplinary domains and traditions. As all PhD theses in the 
DTU were designed by the PhD students and their supervisors as individual re- 
search projects, they have to be seen as independent projects — but nevertheless 
they also aim to speak to the larger research agenda of the DTU as a whole. For 
the purposes of this book though, all PhD students were asked to reflect more 
systematically on how the interdisciplinary setting of the DTU, with its many 
skills training and collaborative activities, had an impact on their individual PhD 
research projects. In addition, we encouraged the authors to think about the 
added value of the concept of digital hermeneutics as a heuristic tool, or inter- 
pretative framework, for their research. The book is therefore a continuation of 
the original effort by all DTU members to share experiences, to document strug- 
gles and failures, and to promote a self-reflexive approach to doing digital hu- 
manities and history research. These auto-ethnographic practices are intended to 
contribute to the growing interest in the pragmatics of digital hermeneutics and 
praxeological studies in the field of history and humanities.“° 


46 See: Lucchesi, “For a New Hermeneutics of Practice in Digital Public History”; Herman 
Paul, “Performing History: How Historical Scholarship Is Shaped by Epistemic Virtues,” His- 
tory and Theory 50, no. 1 (2011): 1-19; Tracie L. Wilson, “Coming to Terms with History: Trans- 
lating and Negotiating the Ethnographic Self,” H-Soz-Kult, June 14, 2012. 
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In the first section of this book, entitled “Hermeneutics of machine inter- 
pretation,” we present five case studies originating from the fields of computa- 
tional linguists, computer science, digital archaeology, and philosophy. The 
common thread of these chapters is that they aim to disclose the added heuris- 
tic and pragmatic value of computer sciences methods and tools for humanities 
research: from historical network analysis in large-scale professional networks 
(Antonio Fiscarelli) to agent-based modeling in Stone Age settlement patterns 
(Kaarel Sikk), from natural language processing and argument-mining in politi- 
cal debates (Shohreh Haddadan) to word embeddings in literary studies and 
autobiographical writings (Ekaterina Kamlovskaya) and text mining and topic 
modeling in philosophical texts (Thomas Durlacher). 

The second section, headed “From ‘source’ to ‘data’ and back,” thematizes 
the many challenges historians face when modeling content for historical re- 
search by transforming complex, inconsistent, fragmented historical “sources” 
into structured data or unstructured datasets.*’ The case studies collected here 
were originally intended to focus on a single step or phase in the research pro- 
cess, such as data search, curation, analysis, or visualization. But all the chapters 
in fact emphasize the non-linear and highly iterative nature of the hermeneutic 
exercise characterizing any research process: from “continuous searching” as 
gradual refinement of the research question (Eva Andersen) to the ephemeral na- 
ture of “living sources” such as place names (Sam Mersch), from fragmented da- 
tasets about Roman trade networks (Jan Lotz) to the “translation” of Renaissance 
paintings into a relational database (Floor Koeleman) and the problem of source 
abundance and digital asset management systems (Sytze Van Herck). 

The final section of the volume, called “Digital experiences and imaginations 
of the past,” problematizes the impact of digital tools and infrastructures in in- 
teracting with the past and simulating new environments that shape our histori- 
cal imagination. Historical research is increasingly challenged to reflect on new 
forms and formats of storytelling and engaging with the broader public — be it in 
schools, museums, or video games. In this section, we look at the pedagogical 
value of a 3D model of a medieval castle (Marleen de Kramer), the learning expe- 
rience of creating a mobile app walking tour on Jewish history (Jakub Bronec), 
and the importance of a user-centric design within digital museum contexts 
(Christopher Morse). 


47 Compare the experiences of humanist researchers of the SFB 1288 “Practices of Comparing: 
Ordering and Changing the World” at the Bielefeld University: https://www.uni-bielefeld.de/ 
(en)/sfb1288, accessed December 3, 2021. Cited in Silke Schwandt, ed., Digital Methods in the 
Humanities: Challenges, Ideas, Perspectives (Bielefeld: Bielefeld University Press, 2020). 
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We hope that this volume offers interesting insights into the laboratory of 
digital history as an interdisciplinary endeavor. We would like to thank all 13 
PhD students for their willingness to share their thoughts and reflections, or, in 
other words, to allow us to have a view into their “digital kitchen”: turning the 
“raw” into the “cooked” is a process asking for creativity and rigorousness, con- 
ceptual thinking and hands-on experiences, and - in the specific case of this 
Doctoral Training Unit - both team-playing spirit and individual initiative.** 
The book is a thoughtful documentation of that “thinkering” process, aimed at 
both educating and encouraging other scholars in the rich trading zone of digi- 
tal humanities. As Patrick Svensson stated in 2012: “The digital humanities can 
be seen as a twenty-first-century humanities project driven by frustration, dis- 
satisfaction, epistemic tension, everyday practice, technological vision, disci- 
plinary challenges, institutional traction, hope, ideals and strong visions.”“? It 
was in exactly this spirit that the Doctoral Training Unit “Digital History and 
Hermeneutics” was driven and experienced. It was, we believe, a worthwhile 
journey. 
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| Hermeneutics of machine interpretation 


Antonio Maria Fiscarelli 

Social network analysis for digital 
humanities 

Challenges and use cases 


1 Introduction 


The field of digital humanities has grown rapidly in recent decades thanks to the 
greater availability of online digital sources, and new software and tools. Never- 
theless, there are still some challenges that must be faced. During the same period, 
and due to the growing computing power and availability of online databases, 
network analysis has gained popularity: researchers from different fields have 
jumped on the network science bandwagon, and words such as “network” and 
“complexity” have become increasingly commonly used. 

Network analysis can be used to model different systems such as biological 
networks,! the World Wide Web, organizations, and societies. A social network 
can be described as a collection of “social actors” who are connected to each 
other if they form some sort of relationship. Social network analysis focuses on 
the relationships among these social actors and is an important addition to 
standard social and behavioral research, which is primarily concerned with the 
attributes of social units.” Not only is it important to acknowledge that social 
relationships are relevant, but also to understand how ties such as this work 
and how they relate to the many underlying social mechanisms governing 
these networks. 

Social network analysis is one of the tools that have become particularly 
popular among humanities scholars. Even though social networks may seem to 
be a fairly recent invention, with the term calling to mind Facebook and other 
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online platforms, they are in fact not limited to modern days.“ For example, 
analysis of social networks has been used to model networks as diverse as 
the marriage and business relationships of the Medici family in fifteenth- 
century Florence,” the evolution of women’s social movements in the nine- 
teenth century,° the personal support network of Jewish refugees during 
the Second World War,’ and visibility networks of Neolithic long barrows in 
the United Kingdom.® 

The rest of this article is organized as follows: Social network analysis and 
some of its tools are introduced in Section 2. Section 3 presents an in-depth re- 
view of the latest historical network research. Finally, a use case drawn from 
my collaboration with a historian colleague is presented in Section 4. 


1.1 Challenges in digital humanities 


The first challenge in digital humanities is of a methodological nature.” On the 
one hand, and particularly in the use of network analysis, there is a risk that 
humanities research will limit itself to the “drawing of complicated graphs”!° — 
yet the use of a certain method or digital tool should not be the main objective 
of research. On the other hand, some scholars may be hesitant to introduce dig- 
ital tools into their research, fearing that these will take them out of the realm 
of history. It is therefore important to understand what digital tools can really 
offer in support of historical research. 
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The second challenge relates to the interdisciplinary nature of digital hu- 
manities. Humanities research can manifest in two forms. In the first case, 
scholars may show interest in a digital tool, start experimenting with it, and 
include it in their workflow. This approach could lead to the tool being used 
rather as a “black box” - i.e. given some input, the black box will produce a 
certain output, while everything in between is unknown. Therefore, it will not 
be possible to understand how the tool works, how to interpret the output, or 
how to recognize any potential bias inherent in that tool. In the second case, 
scholars may seek help, or a collaboration with an expert from another field, 
for example a computer scientist with a solid background in a specific method 
or tool. In this case, there is the risk that the humanities scholar will become a 
simple “data provider” for the model maker.” It is also essential to find a com- 
mon vocabulary and be able to conciliate the two different perspectives in this 
scenario. Only if this is achieved can the two researchers start negotiating new 
forms of knowledge and successfully undertaking historical research together. 
In fact, my role in this project was to assess all these issues and ensure a fruit- 
ful collaboration between humanists and computer scientists. 

Another issue relates to the data themselves. Historians nowadays have ac- 
cess to much larger amounts of data than their predecessors, whether from digi- 
tized classical sources (scans of books, digitized old photographs and recordings) 
or born-digital sources (websites, social networks). They can also access these 
sources at high speed and relatively low cost. For that reason, historians may be 
experiencing a paradigm shift, going from a scarcity to an abundance of sources, ” 
while traditional methods used by historians may be failing to deal with such a 
volume of information. One example of such methods is close reading, which 
may fail in its purpose when the researcher is faced with very large collections of 
texts without the support of computer-based techniques. The easy accessibility of 
data comes with new questions too. Which sources have been digitized, which 
were discarded and what criteria were used to select those retained? It is also im- 
portant to identify the origin of such sources. What was the provenance of the 
original sources? For born-digital sources, how were they generated? 

Data storage has also changed with the advent of the digital era. The use of 
new technologies has made storing data far easier — a single hard drive can 
now store thousands of documents, and is cheap, small, and easy to transport. 
It can be easy to think that digital data will last forever. Unfortunately, data 
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stored in digital form do not have any intrinsic meaning without the specific 
software or technology that can read them, and these technologies can become 
obsolete within a decade or even less. One may also think that digitally stored 
data is safe from aging. Indeed, unlike analog sources, digital data do not dete- 
riorate. However, a single malfunction of the storing volume could render an 
entire data collection inaccessible and irretrievably lost.” 


1.2 Project summary 


The main objective of my doctoral project is to show how humanities research can 
benefit from network analysis by providing PhD students from other disciplines — 
such as history, psychology, linguistics, and archaeology — with the right tools to 
help them answer their historical questions and by adapting these tools to their 
research projects. In this way, a fruitful collaboration is sought, where each side 
can benefit from the other: humanities scholars gain a critical understanding of 
digital tools and their functionalities, while computer scientists find new use cases 
and applications, at the same time learning to appreciate the needs of humanists. 
Understanding each other’s needs is crucial to the collaboration. Instead of two 
distinct groups with separate interests, I envision humanists and computer scien- 
tists joining forces to share their knowledge and expertise in order to tackle the 
new challenges that are emerging in digital humanities. Only with a common goal 
and a shared vision can this collaboration be effective and still worth the time and 
effort required. 


2 Social network analysis 


Historically, the first encounter with network analysis is seen in the “Seven 
Bridges of Königsberg” problem.“ The then Prussian city of Königsberg was 
built on four main areas: the two sides of the Pregel River and two small is- 
lands, connected by seven bridges. The problem consisted in finding a route 
that reached all the areas of the city by crossing each bridge exactly once. Euler 
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modeled this problem using what we now call graph theory — representing the 
city areas as nodes and the bridges as edges connecting nodes — and proved it 
to be unfeasible: it has no solution. 


2.1 Complex networks 


Complex networks are those that exhibit unusual properties that make them dif- 
ferent from other, simple networks. Some of these properties have played an im- 
portant role in the development of the field of social network analysis and are 
worth examining. 


2.1.1 Some definitions 


A graph, or network (the terms are often used interchangeably), can be directed 
or undirected, depending on whether the direction of a connection is relevant. It 
can also be weighted or unweighted, where the weight represents cost, strength, 
or the importance of a connection. 

The degree of a node v; represents the number of incident edges it pos- 
sesses — in other words, the number of the node’s direct connections. In the 
case of a directed network, its in-degree and out-degree are also defined, and 
these refer to the number of ingoing or outgoing edges of a node. 

The average path length of a network is defined as the average shortest 
path between any two nodes in that network. The diameter of a network is de- 
fined as its maximum shortest path. These two metrics represent how easily in- 
formation can travel through a network. 

The clustering coefficient of a network is defined as the average local cluster- 
ing coefficient of each node in the network. The local transitivity of a node is the 
ratio of the triangles connected to the node and the triples centered on the node.” 
This metric is related to the concept of transitivity: given that v; is connected to v;, 
and v; is connected to vx, what are the odds that v; is also connected to vg? 
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2.1.2 Small world phenomenon 


The small world phenomenon was first identified during Milgram’s experiments 
regarding social networks." The experiments’ objective was to send a letter from 
a source person in Nebraska to a target person in Massachusetts. The source per- 
son was asked to send the letter to whichever of their acquaintances was most 
likely to be connected to the target person, with the objective of reaching the tar- 
get within as few steps as possible. Milgram noticed that source and target were, 
on average, between five and six people apart. This average path length figure 
was much lower than the number of people involved in the experiments, and be- 
came associated with the term “six degrees of separation.” 

Later on, Watts and Strogatz discovered that many real-world networks — 
such as the Western US power grid, the brain network of the nematode species 
C. elegans, and the World Wide Web - even though of different types, all had 
the same two properties: low average path length and a high clustering coeffi- 
cient.” The network models known at that time - regular lattices and the ran- 
dom network model developed by Erdős and Rönyi'° - failed to capture these 
properties. In fact, regular lattices have high average path lengths and high 
clustering coefficients, while random networks have low average path lengths 
and low clustering coefficients. Watts and Strogatz proposed a model that, 
starting from a regular lattice, randomly rewires edges according to a certain 
probability p between zero and one. If this probability is properly chosen, the 
model can generate small-world networks. In fact, these networks still pre- 
serve the high clustering coefficient of regular lattices, but the rewiring of a 
few edges makes the distance between nodes much smaller. 


2.1.3 Scale-free networks 
Barabäsi and Albert noticed that, for many complex networks, the degree distribu- 


tion does not follow a Poisson distribution with a peak around the mean value, 
but rather a power-law distribution.” This means that a very small number of 


16 Stanley Milgram, “The Small World Problem,” Psychology Today 2, no. 1 (1967): 60-7. 

17 Duncan J. Watts and Steven H. Strogatz, “Collective Dynamics of ‘Small-World’ Networks,” 
Nature 393, no. 6684 (1998): 440. 

18 Paul Erdös and Alfred Renyi, “On the Evolution of Random Graphs,” Publications of the 
Mathematical Institute of the Hungarian Academy of Sciences 5, no. 1 (1960): 17-60. 

19 Albert-Läszlö Barabäsi and Réka Albert, “Emergence of Scaling in Random Networks,” Sci- 
ence 286, no. 5439 (1999): 509-12. 


Social network analysis for digital humanities — 29 


nodes (or hubs) in the network have a very high degree - something that the 
Watts-Strogatz model was missing. Barabasi and Albert realized that many real- 
world networks show a preferential attachment: nodes do not connect randomly 
but, rather, favor more “popular” nodes. For example, novice researchers in a col- 
laboration network are more likely to aim to collaborate with researchers who are 
further on in their careers and already have many connections. Furthermore, com- 
plex networks are not static but instead grow in size. In fact, every year, new re- 
searchers start their careers and are added to the network. Barabasi and Albert 
proposed a model that, based on these two mechanisms, can generate networks 
with a power-law degree distribution. The network starts with a fixed number of 
nodes. New nodes are then added and are connected to other nodes with a proba- 
bility based on their degree. The networks generated with this model are called 
scale-free networks. 


2.1.4 Emergence of communities in complex networks 


Another important property of complex networks is their organization into com- 
munities. A community consists of a group of nodes that are highly connected to 
each other but loosely connected to the rest of the network.”° For example, re- 
searchers in a collaboration network tend to connect to other researchers in the 
same field, resulting in the emergence of communities that represent similar re- 
search topics. Communities can be disjoint if nodes can only belong to a single 
community, or overlapping if they can belong to many. 


2.1.5 The importance of weak ties 


So far, we have seen that complex networks show high transitivity. Because of 
transitivity, nodes become highly connected to each other — and as a conse- 
quence, the network self-organizes into communities. We have also seen that, in 
a complex network, the average path length must be low. Therefore, it is neces- 
sary that some nodes act as “bridges” between communities. These connections 
are called weak ties. Sociology identifies two different kinds of ties in social net- 
works: strong ties represent established interpersonal relationships, and are 
found in intracommunity connections; weak ties represent acquaintances, and 
are found in intercommunity connections. Granovetter, in his study, showed that 
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people are more likely to find a new job through their acquaintances rather than 
through close friends.” This proved that weak ties are very important when it 
comes to the transmission of information within the network. While individuals 
in the same community can only share information that most of them probably 
already know, acquaintances can provide access to novel information. 


2.2 Centrality metrics 


Centrality metrics represent an important tool for the analysis of social networks. 
These metrics are defined on the nodes, and they rank nodes according to their 
position in a network.” Degree centrality measures the number of direct connec- 
tions of a node and can be used to identify actors who are highly connected. Be- 
tweenness centrality is computed as the number of shortest paths between any 
two nodes in the network that go through a certain node. It measures to what ex- 
tent an actor has control over the information flowing between other actors and 
can be used to identify those actors who occupy strategic positions in the network 
in terms of information exchange. Closeness centrality is computed as the average 
shortest path between a node and any other node in the network, and measures 
how long it will take for information to flow from one node to the rest of the net- 
work. The first person to experiment with centrality metrics was Bavelas, who 
showed that centrality measures were linked with group performance and that 
centrality metrics can help identify people with different roles in the network.” 


2.3 Orbit analysis 


Graphlets are small connected graphs with a size of between two and five 
nodes. Graphlet analysis is a useful tool for analyzing the global topological 
structure of networks and, locally, of a node’s ego network. Figure 1 shows all 
the graphlets with up to four nodes. Some well-known examples are the “star” 
graphlet and the “triangle” graphlet. Some graphlets are characteristic of certain 
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types of network. For instance, the triangle is more likely to be found in social net- 
works, due to high transitivity, while the star is more likely to be found in visibil- 
ity networks. Graphlet counts, defined as the number of times that each graphlet 
appears in a network, can be used to characterize networks. 

Nodes within a specific graphlet can have different roles. For example, in the 
star graphlet, one node can be identified as the center and the other three nodes 
as the leaves. Similarly, an orbit count can be defined as the number of times a 
node appears in each orbit, and can be used to identify groups of nodes that play 
different roles in the network. The orbit count for the central position of the 
“brokerage” graphlet can, for instance, be used to identify “mediator” nodes in 
collaboration networks. 
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Fig. 1: Graphlets with up to four nodes, with their different orbits. 2020. © Antonio Fiscarelli. 


2.4 Exponential random graph models 


Exponential random graph models (ERGMs) are a family of statistical models 
that help us discover and understand the processes underlying network forma- 
tion.”* They have been used extensively in social network analysis and are 
popular in various fields such as sociology,” archaeology,” and history.” ERGMs 
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provide a model for networks that includes covariates — variables that relate to 

two or more nodes — which cannot be addressed using traditional methods. They 

can represent effects such as: 

— homophily: the tendency of similar nodes - i.e. nodes having the same at- 
tributes — to form relationships. 

— mutuality: the tendency of node B to form a relationship with node A, if 
node A is connected to node B. 

— triadic closure: the tendency of node C to form a relationship with node A, 
if node A is connected to node B, and node B is connected to node C. 


ERGMs also provide maximum-likelihood estimates for the parameters governing 
these effects. For example, they can estimate the increased likelihood of a tie ex- 
isting between two nodes when these nodes have the same attributes. ERGMs also 
provide a “goodness-of-fit” test for the model, in order to verify whether the ef- 
fects included in the model are sufficient to explain the structure of the observed 
network. Furthermore, they can simulate networks that match the probability dis- 
tributions estimated by the model. In other words, they can be used to generate 
artificial networks that reflect the characteristics of the observed network. 


3 Current trends in historical network analysis 


There are already several examples of historians incorporating network analysis 
into their research. In this section I review some of their work, including how 
they translated historical questions into a social network analysis perspective, 
and identify what I consider to be the missed opportunities in these studies. 

Breure and Heiberger, in their study, argue that eponyms serve as a 
proxy for contact and are a promising way to explore historical relationships 
between natural scientists.” Eponyms are used in taxonomy when an author 
describes a new species for which they use the name of a person - usually a 
field collector or colleague. 

Breure and Heiberger tested this hypothesis on the community of malacolo- 
gists (i.e. zoologists studying mollusks) in the nineteenth century, analyzing 
the recorded activity of malacological authors between 1850 and 1870. The da- 
taset used contained authors’ information such as age and home country, as 
well as performance measures like their numbers of publications, pages, coau- 
thored publications, and coauthors. Each connection between authors was 
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classified as an eponym, an exchange of material, or a coauthorship. Therefore, 
these authors had, effectively, built a collaboration network, in particular a 
multiplex network, where nodes interact within different layers (depending on 
the type of interaction) but there is no interaction between the different layers 
themselves. This network, consisting of 476 nodes and 1,822 edges, can be con- 
sidered of medium size. The authors in the network were ranked according to 
their number of publications, and elite authors were identified as those who 
contributed to 80 percent of the total publications. 

Breure and Heiberger noticed that few authors published a large number of 
papers, something that has been widely recognized in bibliometrics. They also 
identified two heavily linked communities that represented authors dealing with 
recent shells and those dealing with fossil (paleontological) shells. They manually 
assigned authors to one of the two communities, depending on their research in- 
terests. It would have been interesting to use a community detection algorithm to 
compare the communities found with the ones identified by the authors, using 
metrics such as normalized mutual information” or adjusted randomized index” 
to quantify the agreement of the result, and thus assess any bias in the manual 
assignment. 

The authors used ERGMs to find out what effects had shaped the network of 
collaboration and found that authors from the same country were more likely to 
connect with each other, and that higher publication numbers increased the 
odds of a tie between authors. They also discuss how eponyms could result in a 
collaboration between authors, but this hypothesis was not tested, even though 
ERGMs offer the possibility of testing whether a tie in one layer increases the 
odds of a tie in a different layer. 

Fernandez Riva, in his work, introduced a new method for analyzing shared 
manuscript transmission of medieval German texts, based on network analysis.” 
Medieval manuscripts contain several texts that were brought together according 
to certain criteria — both cultural (common genre) and practical (availability, 
size, etc.) - rather than being randomly grouped. Fernandez Riva modeled the 
transmission of shared manuscripts as a network, where nodes represent texts 
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that are deemed connected if they appear in the same manuscript, and a weight 
is assigned if texts appear together in more than one manuscript. He does not 
mention the size of the network, however he specifies that the largest connected 
component of the network included 76 percent of the nodes, while several smaller 
components (of two to eight nodes) included 6 percent of the nodes, and the re- 
maining 18 percent consisted of isolated nodes. Fernandez Riva decided to name 
these three different parts of the network “Continent,” “Archipelagos,” and “Is- 
lands.” He proceeded by applying a community detection algorithm on the largest 
component to identify communities, although the algorithm used is not men- 
tioned. Since the nodes had no attribute data - such as genre, time, or location — 
available, the author manually inspected the outcome of the algorithm to verify 
whether any of these characteristics correlated with the communities found, and 
came to the conclusion that there was a high overlap between communities, even 
for different genres. He used eigenvector centrality to identify texts that tended to 
appear in large collections, and betweenness centrality to identify texts that con- 
nected different communities in the network and fitted into more than one genre. 
These metrics helped him identify texts that occupied strategic positions in the 
network, something that would have been impossible by human inspection. Al- 
though the author does not really provide statistical methods for his analysis of 
the network of interest — instead limiting his work to the visualization of the net- 
work and the computation of centrality metrics — it must be recognized that the 
data available to him were rather limited. 

Valleriani et al. analyzed the emergence of epistemic communities during 
the early modern period.” They worked on a corpus of printed cosmology text- 
books used at European universities, dividing each book into several text parts, 
representing “atoms” of knowledge. The authors built a directed, weighted, mul- 
tilayer network where nodes represented books that were connected to each 
other, on different layers, if they contained text parts that reoccurred in time 
(i.e. if they contained the same text, adaptations or translations of the same text, 
commentaries on the same text, or commentaries on the same adaptation), for a 
total of five layers. The network was a directed one, with the directionality being 
chronological, from older to more recent occurrences. The weight of connec- 
tions, on the other hand, was given by the number of text parts that reoccurred 
in two different books. The corpus contained 563 text parts, but the authors de- 
cided to consider only those parts reoccurring at least once, and with at least 
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one year between reoccurrences. Therefore the network, which can be consid- 
ered of small-to-medium size, consisted of 239 text parts and 1,625 reoccur- 
rences. The authors also analyzed the aggregated graph, which included the 
same set of nodes — two nodes were deemed connected if they were connected 
in any of the five layers. The authors performed a longitudinal analysis by first 
looking at the age distribution of connections for each layer of the network - 
computed as the difference between years of publication of the two text parts at 
the ends of each connection — and found substantial differences between layers. 
They then looked at the various connected components of the network in order 
to identify the different epistemic communities. Using a series of plots, they ana- 
lyzed the distribution of nodes’ out-degrees, normalized by the publication date 
of the text. For each plot, the visualization was further enhanced with different 
colors representing the nodes’ attributes such as in-degree, publication place, 
book format, and network layer. The analysis is followed by an in-depth inter- 
pretation of the results, and discussion on the emergence and evolution of the 
different families of editions. Again, the methodology provided is based more on 
data visualization than statistical analysis or advanced modeling techniques. 
Cline, in her work, has used social network analysis to study political life in 
Athens between the 460s and 450s BC.” She builds three increasingly broad so- 
cial networks using selected biographies from Plutarch’s Lives, from which she 
retrieves all actors and their interrelationships. The first network uses Plutarch’s 
“Life of Pericles” and consists of 54 actors and 79 ties, which essentially equates 
to Plutarch’s ego network. She then enlarges this by adding actors from “Life of 
Alcibiades.” This second version of Athens’ social network contains 106 nodes 
and 145 connections. Lastly, she includes “Life of Cimon” and “Life of Nicias,” 
for a total of 133 nodes and 191 ties across this largest network, formed from all 
four biographies’ actors. These networks are all of a small size, undirected, and 
unweighted. The author says she is working with a multiplex network, since ties 
between actors are of different natures (family, work, friendship), even though 
there is no distinction between these ties in the analysis. Her objective is to dem- 
onstrate that the social network of Athens’ political life was a small world. Her 
argument is that democratic institutions in Athens enabled people belonging to 
different circles and social classes to meet, hence favoring innovation and the 
diffusion of new ideas. From a network perspective, this would reflect in Athens’ 
social network having a low average path length, high level of transitivity and a 
core-periphery structure where degree distribution follows a power law, with few 
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highly connected nodes and most nodes having a low degree. Indeed, she com- 
putes transitivity, average path length, and diameter for all the networks, and 
compares them with the same quantities computed on a random network of the 
same size. All these measures confirm that Athens at the time was indeed a small 
world. For the core-periphery structure, Cline computes the degree distribution 
but does not perform any statistical tests to verify whether a power law is the 
best fit. She also computes betweenness for each actor to confirm that women 
tend to occupy central positions in the network, connecting different families via 
marriage. For this work, information such as gender, family, and social status 
was available. Therefore, it would have been interesting to test whether any of 
these attributes had an influence on the network of connections. 

Schauf and Escobar Varela?!) used network analysis techniques to identify 
characters who play structural roles in the Javanese wayang kulit incarnation of 
the Mahabharata epic, which involves representations of the series of stories — 
here called lakon - from the epic. They build a weighted, undirected co-occurrence 
network, where nodes represent the characters of the epic and these characters are 
deemed connected if they are mentioned in the same scene of any story. Weights 
indicate how many times two characters appear in the same scene. Each node is 
enriched with several attributes such as characters’ tribe affiliation, origin, species, 
and gender. The authors also build two different null models that preserve, on av- 
erage, the degree distribution of nodes. They compute betweenness centrality and 
closeness centrality for each character in the empirical network, as well as in the 
two null models. In this way, it is possible to identify outliers whose centrality val- 
ues are significantly higher or lower than expected, i.e. compared to the same 
quantity computed in the null models. For example, the authors find that female 
characters, despite being few in number and appearing relatively infrequently, 
seem to dominate the top ranks for betweenness. They also propose a variation of 
these centrality metrics that is based on the attributes of nodes. For example, the 
inter-faction betweenness centrality is used to identify those characters who act as 
“bridges” within their tribe, while the faction-world betweenness centrality identi- 
fies characters who act as bridges between their tribe and the rest of the network. 

One of the challenges that emerges from historical network research work- 
ing with historical data is dealing with missing and incomplete data.” Net- 
worked data have to be extracted from sources such as books, bibliographies, 
and diaries that were originally analog and only digitized later, if needed. These 
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sources are often incomplete or do not provide enough information to build the 
network of interest. Additionally, missing data in network research are more 
critical than in social and behavioral research. Even a small portion of missing 
data can be problematic if those data are related to crucial nodes (see hubs in 
Section 2.1.3) or ties (see weak ties in Section 2.1.5) This is also in contrast to his- 
torical research working with born-digital data, such as online databases or data 
scraped from social networks, where data are rather abundant. 


4 Use case: Gender and ethnic collaboration 
patterns in a temporal co-authorship network 


Sytze Van Herck is one of the PhD students at the University of Luxembourg’s 
doctoral training unit in digital history and hermeneutics. Her main research in- 
terests are intersectionality and gender within the history of computing — and her 
work examines occupational segregation, working conditions, and gender stereo- 
types in advertising from the 1930s until the end of the 1980s. Sytze and I applied 
social network analysis techniques to analyze the gender and ethnicity gap in 
computer science research.” During the last few decades many bibliographic da- 
tabases containing the publication records of scientists from different fields have 
been published online. Starting from these records, a collaboration network can 
be built where nodes represent authors, and authors are deemed connected if 
they have coauthored one or more papers together. This network of scientists can 
provide many insights into collaboration patterns in the academic community. 
The dataset that Sytze and I used for the use case discussed here was one 
derived from a snapshot of the DBLP bibliographic database taken on 17 Septem- 
ber 2015 and publicly available.” It contains 112,456 papers, written by 126,094 
authors and published at 81 different computer science conferences between 
1960 and 2015. The dataset includes author gender, which was generated by the 
Genderize API based on the first forename of an author.°® For ethnicity data we 


36 Sytze Van Herck and Antonio Maria Fiscarelli, “Mind the Gap Gender and Computer Sci- 
ence Conferences,” in This changes everything — ICT and Climate Change: What can we do? IFIP 
International Conference on Human Choice and Computers, ed. David Kreps et al. (Cham: 
Springer Nature Switzerland, 2018), 232-49. 

37 Agarwal Swati et al., “DBLP Records and Entries for Key Computer Science Conferences,” 
Mendeley Data, V1, 2016. 

38 Genderize API, accessed April 21, 2021, https://genderize.io. 
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decided to use the R package called wru that uses the algorithm implemented by 
Kosuke and Kabir to predict ethnicity based on last name and gender.” 
Our research was driven by the following questions: 

- Do minorities in computer science demonstrate different collaboration 
patterns? 

— As we saw in Section 2.1.1, metrics such as clustering coefficient, average path 
length, and diameter can characterize entire networks. A large clustering coef- 
ficient can be used to identify densely connected networks with high transitiv- 
ity, while low average path length and diameter can identify networks in 
which information flows faster. For this reason, we decided to extract male 
and female subnetworks from the dataset, as well as networks of white re- 
searchers and researchers of color, by considering only the nodes with the se- 
lected attribute and the connections between those nodes. We then computed 
clustering coefficient, average path length, and diameter on these networks 
and compared the results. We found that the female researchers had a more 
close-knit network than the male researcher network — and that white re- 
searchers, even though they were not a minority, showed a similar behavior. 

- Do minorities in computer science struggle to be successful? 

— The metrics commonly used to quantify the success or popularity of a re- 
searcher are based on the numbers of their publications and citations. We 
decided, instead, to use network metrics (presented in Section 2.2) that 
were based on the position that researchers occupied in the coauthorship 
network and metrics based on a researcher’s ego network structure. We 
computed some local network metrics such as betweenness centrality, 
closeness centrality, local clustering coefficient, and degree centrality, and 
then ranked male and female researchers, as well as white researchers and 
researchers of color. We found that female researchers generally scored 
lower than their male counterparts in terms of network connections, and 
had more closely knit networks. However, those ranked at the top obtained 
better results. Researchers of color, who were mostly Asian researchers, oc- 
cupied more strategic and central positions in collaborations, outperform- 
ing white researchers. 

- Do minorities play different roles in the network? 

- To answer this question we used orbit analysis (discussed in Section 2.3) to 
compute the average orbit count for female and male researchers, as well as 
for white researchers and researchers of color, and compared the results. We 


39 Kosuke Imai and Kabir Khanna, “Improving Ecological Inference by Predicting Individual 
Ethnicity from Voter Registration Records,” Political Analysis 24, no. 2 (2016): 263-72. 
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found that male researchers dominated central roles, corresponding for exam- 
ple to the central orbit in the star graphlet, while female researchers tended to 
occupy the peripheral positions. In particular, in the brokerage graphlet, male 
researchers more often occupied a brokerage position, corresponding to the 
central orbit of this graphlet, while a pair of female researchers and an individ- 
ual female researcher were more likely to be found in the peripheral orbits of 
the same graphlet — implying the male researcher played a mediating role be- 
tween these female researchers. 

- Does the minority bias become mitigated over time? 

— We built a temporal version of the coauthorship network and answered the 
same questions to see if there were any changes over time. Firstly, we found 
that the size of minority groups had expanded over time, with their intragroup 
homophily increasing even faster. Female researchers performed better at 
higher ranks only during specific periods, such as in the middle of the 1980s 
and toward the end of the 1990s. The trend for ethnicity, on the other hand, 
inverted over time: researchers of color, mostly Asian, occupied more central 
positions until the mid-1990s, while they have become more closely knit in re- 
cent years. In the orbit analysis we found that gender differences had narrowed 
over time, while we observed a complete inversion of the trend for ethnicity. 


4.1 Reflections and challenges 


The aim of this collaboration was to build a bridge between the very different dis- 
ciplines of humanities and computer science. We faced several challenges during 
this work. The first was related to the algorithmic bias associated with the gender 
and ethnicity prediction algorithms. The gender prediction was based on the 
given name (or forename) of an author. This was a generalization that was neces- 
sary given the large number of authors and the limited personal information avail- 
able. First of all, we assumed that gender is binary, rather than more complex. 
Secondly, the same given name may be more commonly associated with being a 
male or female name depending on the country of origin. For example, the name 
“Andrea” is commonly feminine, while it is widely used as masculine in Italy. Ad- 
ditionally, the gender identity of a person may not match their biological sex. 

The ethnicity prediction algorithm, on the other hand, is based on the family 
name (surname) and gender of an author. This is also a generalization, since a per- 
son’s cultural identity may be different from their ancestry (or indeed from their 
spouse’s ancestry where family names are changed on marriage). For example, 
many second- and third-generation American citizens have Italian surnames due 
to their Italian ancestry, while embracing an American identity. We also noticed 
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that the gender prediction algorithm was less accurate for ethnic minorities. We 
therefore decided to build two separate networks for our analysis: one containing 
all authors whose gender prediction had at least 99 percent accuracy (i.e. a 
99 percent likelihood of being correctly assigned as male or female), and an- 
other containing all authors whose ethnicity prediction score had at least 
50 percent accuracy (i.e. 50 percent likelihood of belonging to a certain eth- 
nicity versus all other ethnicities). 

The fact that the algorithms do not have 100 percent accuracy shows that 
the use of digital tools does not remove bias. Algorithms contain an intrinsic 
bias because they are designed by humans, and researchers also introduce bias 
when choosing a certain algorithm. 


5 Conclusion 


The main objective of this project was to show how humanities research can ben- 
efit from network analysis, by providing PhD students from different fields with 
the right tools to help answer their historical questions, and adapting these tools 
to their research projects. In this way, a fruitful collaboration — where both sides 
can benefit from each other — may be sought: humanities scholars gain a critical 
understanding of digital tools and their functionalities, while computer scientists 
find new use cases and applications, at the same time learning to understand 
the needs of humanists. Understanding each other’s needs is crucial for such 
collaborations. Instead of two distinct groups with separate interests, I envision 
humanists and computer scientists joining forces and sharing their knowledge 
and expertise in order to tackle the new challenges that are emerging in digi- 
tal humanities. Only with a common goal and a shared vision can this col- 
laboration be effective and still worth the time and effort required. 

This article describes how I reviewed the latest historical network research in 
order to assess the current practices of historians using network-based methods, 
and discusses some of the challenges faced in digital humanities. As part of this 
work I translated historical problems for computer science peers and explained 
the basics of social network analysis to historians. I have also presented a use 
case here, drawn from my collaboration with a historian colleague, showing how 
social network analysis can be used to answer historical research questions. In 
particular, I presented our joint research questions and the tools we used to an- 
swer them. Finally, I reflected on the challenges we encountered during our joint 
work, such as the generalizations that we made in order to model our scenario 
and the algorithm criticism regarding the gender and ethnicity predictions. 
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Kaarel Sikk 

Hunting for emergences in stone-age 
settlement patterns with agent-based 
models 


1 Introduction 


Complexity science focuses on explaining phenomena as systems composed of a 
multitude of components interacting with each other. This approach offers a 
good reflection of social systems which are composed of individuals. Social sci- 
entists have long been aware of how complex structures emerge from individual 
behaviors. During recent decades, researchers have also started to use complex 
systems to explore the past. These studies have mostly applied agent-based mod- 
els in the field of archaeology’ and in fields specializing in modeling, for exam- 
ple those under the umbrella term cliodynamics.” Until recently, the application 
of complexity science has largely been neglected by humanists and historians in 
particular. 

This chapter discusses the opportunities offered by complexity science ap- 
proaches, and particularly agent-based modeling (ABM), for humanities schol- 
ars studying the past. The discussion is based on explorative interdisciplinary 
research applied to the emergence of settlement patterns, as observed in ar- 
chaeological material. Its main purpose is not to report the research results, 
which are published elsewhere,’ but to discuss the explorative process of the 


1 Mark W. Lake, “Trends in Archaeological Simulation,” Journal of Archaeological Method and 
Theory 21, no. 2 (2014): 258-87; and J. Daniel Rogers and Wendy H. Cegielski, “Opinion: Build- 
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emy of Sciences 114, no. 49 (2017): 12841-44. 

2 For examples see: “Cliodynamics,” Wikipedia, accessed April 22, 2021, https://en.wikipedia. 
org/wiki/Cliodynamics. 

3 Kaarel Sikk and Geoffrey Caruso, “A Spatially Explicit Agent-Based Model of Central Place 
Foraging Theory and Its Explanatory Power for Hunter-Gatherers Settlement Patterns Forma- 
tion Processes,” Adaptive Behavior 28, no. 4 (2020): 1-21; Kaarel Sikk et al. “Environment and 
Settlement Location Choice in Stone Age Estonia,” Estonian Journal of Archaeology 24, no. 2 
(2020): 89-140. 
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ABM-driven research project, along with additional values and unexpected in- 
sights gained during the study. Elements of the research process that could apply 
to other studies and fields are reflected upon using digital hermeneutics as put 
forward by historians as a reference model.* Interdisciplinary contact points be- 
tween the social and natural sciences and the humanities, that form the basis of 
the study, are discussed. 

In the following sections, concepts of modeling, emergence, and complex 
systems are discussed from a humanist viewpoint, then an overview of a case 
study is presented and, based on the experience of the project, wider applica- 
tions of ABM practices are discussed. 

The research presented here is based on ideas and cooperation with people 
from an interdisciplinary context including fields like archaeology, history, quan- 
titative geography, economy, computer science, and complex systems modeling.’ 


2 Concepts and methods 
2.1 Emerging complexity 


Adoption of complex systems approaches becomes reasonable if the research 
object exhibits emergent behavior, which means that the system in general 
possesses properties that its individual elements do not have. Although re- 
search on complex systems has escalated quite recently, the general ideas be- 
hind complexity science are in fact very old. 

Emergence was already being described in the ancient world by philoso- 
phers like Aristotle who, in the earliest known such record, wrote in his Meta- 
physics: “In the case of all things which have several parts and in which the 
totality is not, as it were, a mere heap, but the whole is something besides the 
parts, there is a cause; for even in bodies contact is the cause of unity in some 
cases, and in others viscosity or some other such quality.”° 


4 Andreas Fickers, “Hermeneutics of In-Betweenness: Digital Public History as Hybrid Practice,” 
in Handbook of Digital Public History, ed. Serge Noiret, Mark Tebeau, and Gerben Zaagsma (Berlin: 
De Gruyter, forthcoming). 

5 The author wishes to thank Andreas Fickers for building the research environment behind 
this research, Geoffrey Caruso for his guidance on quantitative geography, Aivar Kriiska for ar- 
chaeological data and insights, Juliane Tatarinov for initiating this volume, and Iza Romanow- 
ska for very helpful feedback on the research and text. 

6 Aristotle, Metaphysics, trans. William D. Ross (Oxford: Clarendon Press, 1908), 980a. 
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During the nineteenth century the axiom that the whole is greater than the 
sum of its parts (Renouvier), and Boutroux’s idea that higher levels of analysis 
are irreducible to the lower levels, became known among scholars studying so- 
ciety. Durkheim used these ideas to deduce the central concept of the newly 
born discipline of sociology: the sui generis, now referred to as emergence. 

There are some well-known iconic examples of emergent systems observed 
by science. For example, through physics we know about rules governing sub- 
atomic particles, but those rules do not inform us about the chemical properties 
of the substance formed by those particles. Rules and theories in chemistry are 
formulated for another scale of analysis. The science of biology in turn consid- 
ers life as an emergent property of chemical systems. Likewise human culture 
is not explained by the biological characteristics of humans but requires an- 
other level of observations. These gaps are unintuitive for the human mind and 
often form the boundaries of disciplines.’ 

Social emergence can be illustrated by the power law governing the distribution 
of the connections that individuals have in a society, and which emerges as a result 
of a preferential attachment process. Individuals often prefer connections with 
others who already have more connections, for example because of better access to 
information or higher perceived trustfulness (doing business with rich people, 
being friends with people with more friends). This preference develops an exponen- 
tial distribution of connections (friends, wealth) and the dynamic process described 
as “the rich get richer” emerges (the Matthew effect). From an individual's point of 
view or level of analysis it might not be intuitive that the general trend of being 
friends with popular people leads to an increase in social inequality. This illustrates 
how phenomena usually observed at different levels of analysis are interrelated. 

The remarkable thing about society is the ease with which simple individ- 
ual rules and changes in individual connection lead to complexity - and, quot- 
ing Epstein and Axtell, “it is not the emergent macroscopic object per se that is 
surprising, but the generative sufficiency of the simple local rules.”? This quote 
expresses that only very basic rules governing individual choices are required 
to form complex systems with new properties. 


7 See, for example, Mark Bedau and Paul Humphreys, Emergence: Contemporary Readings in 
Philosophy and Science (Cambridge, MA: MIT Press, 2008), 10-8. 

8 Simulation of evolution of power-law distribution of node degrees on synthetic network by 
using: Uri Wilensky, “NetLogo Preferential Attachment Model,” (Center for Connected Learn- 
ing and Computer-Based Modeling, Northwestern University, 2005), accessed December 1, 
2020, http://ccl.northwestern.edu/netlogo/models/PreferentialAttachment. 

9 Joshua M. Epstein and Robert Axtell, Growing Artificial Societies: Social Science From the 
Bottom Up (Washington, DC: Brookings Institution Press, 1996), 52. 
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Analyzing emergent relations between different analytical levels became pos- 
sible only after new fields of systems theory and cybernetics arose during the 
1940s. New ideas morphed into the discipline of complexity science, which pro- 
vided a toolkit for studying complex systems involving relations between their 
components and properties like adaption, nonlinearity, spontaneous order, 
feedback loops, and emergence. Agent-based modeling (ABM), an analytical 
approach to solving systemic issues, was developed and became practically 
applicable during the computational revolution of the 1980s. 


2.2 Agent-based modeling - a tool for exploring complexity 


ABM is a computational simulation method developed to explore complex sys- 

tems by combining different levels of analysis. It lets us explore how the rela- 

tively simple behaviors of system components lead to the general emergence of 

complex phenomena. Building on the classical definition from Clarke, “a model 

is a mechanism which connects theory to data,”'° ABM is a mechanism that en- 

ables us to connect the theory of one level of analysis to data on another level. 
The agent-based modeling process is accomplished in a number of key steps:" 

1) The characteristics of the environment and the rules governing individual 
agents (ontology) are defined. 

2) These characteristics and rules are then formalized as algorithms and their 
configurations, so that the latter can be executed as a computer program. 

3) The created models are calibrated to fit available observations. 

4) The models are validated to behave as expected (face validation). 

5) Any further analytical processes are performed, such as running simula- 
tions of scenarios which can be compared to empirical observations or the- 
ories, and model exploration to explain phenomena and build theories. 


ABM as a simulation technique enables us to explore scenarios that cannot be 
observed in empirical reality” and thus involves the experimental method in 
disciplines usually limited to descriptions and the comparative method. 


10 David L. Clarke, “Models and Paradigms in Contemporary Archaeology,” in Models in Ar- 
chaeology, ed. David L. Clarke (London: Methuen, 1972), 1-60. 

11 For an overview of the ABM workflow carried out for this research see Section 3, including 
Figure 1. 

12 James McGlade, “Systems and Simulacra: Modeling, Simulation, and Archaeological Inter- 
pretation,” in Handbook of Archaeological Methods, ed. Herbert D. G. Maschner and Christo- 
pher Chippindale (Oxford: Altamira Press, 2005), 558. 
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ABM can be used to build and test theories of individual behaviors by pro- 
jecting them onto different social and spatial scales.” These scales constitute 
different levels of observation and analysis. For example, written sources de- 
scribe individuals’ perceptions, while archaeological observation could provide 
an aggregate understanding of dynamic phenomena in general. 

The literature of ABM for historical scholarship has so far mainly been lim- 
ited to discussion on the potential use of ADM) Nanetti and Cheong discussed 
how narrative-driven analysis of historical big data can lead to the development 
of explanatory agent-based models in the genre of counterfactual history, one 
possible application of ABM.” Some studies utilizing ABM'® include research 
on infantry tactics,” antiquities infrastructure projects'® and maritime trade.’ 

The situation is different in the field of archaeology where ABM has seen con- 
siderable success in recent years. This may be due to the more quantitative nature 
of the discipline, having its sources reflecting the aggregated activities of people of 
the past and thus being easier to project onto different scales of time and space.”° 


13 Timothy A. Kohler, George J. Gumerman, and Robert G. Reynolds, “Simulating Ancient So- 
cieties,” Scientific American 293, no. 1 (2005): 76-84. 

14 Michael Gavin, “Agent-Based Modeling and Historical Simulation,” Digital Humanities 
Quarterly 8, no. 4 (2014); Marten Diiring, “The Potential of Agent-Based Modelling for Histori- 
cal Research,” in Complexity and the Human Experience: Modeling Complexity in the Humani- 
ties and Social Sciences, ed. Paul A. Youngman and Mirsad Hadzikadic (Singapore: Pan 
Stanford Publishing, 2014), 121; and Edmund Chattoe-Brown and Simone Gabbriellini, “How 
Should Agent-Based Modelling Engage With Historical Processes?,” in Advances in Social Sim- 
ulation 2015, ed. Wander Jager et. al. (Cham: Springer, 2017), 53-66. 

15 Andrea Nanetti and Siew Ann Cheong, “Computational History: From Big Data to Big Simu- 
lations,” in Big Data in Computational Social Science and Humanities, ed. Shu-Heng Chen, 
Computational Social Sciences (Cham: Springer International Publishing, 2018), 337-63. 

16 For an overview see: Dominik Klein, Johannes Marx, and Kai Fischbach, “Agent-Based 
Modeling in Social Science, History, and Philosophy. An Introduction,” Historical Social Re- 
search/Historische Sozialforschung 43, no. 1 (2018): 7-27. 

17 Xavier Rubio-Campillo, Jose Maria Cela, and Francesc Xavier Hernändez Cardona, “The De- 
velopment of New Infantry Tactics During the Early Eighteenth Century: A Computer Simula- 
tion Approach to Modern Military History,” in Agent-Based Modeling and Simulation, ed. 
Simon Taylor (Cham: Springer, 2014), 208-30. 

18 J. Riley Snyder et al., “Agent-Based Modelling and Construction — Reconstructing Antiq- 
uity’s Largest Infrastructure Project,” Construction Management and Economics 36, no. 6 
(2018): 313-27. 

19 Ulf Christian Ewert and Marco Sunder, “Modelling Maritime Trade Systems: Agent-Based 
Simulation and Medieval History,” Historical Social Research/Historische Sozialforschung 43, 
no. 1 (2018): 110-43. 

20 Lake, “Trends in Archaeological Simulation,” 258-87. 
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ABM has been used to explore hominid dispersal,” hunter-gatherer foraging” and 


settlement choice,” the agriculture and economy of Neolithic village communi- 
ties,” the social and economic organization of ancient civilizations,” and cultural 
transmission,” among other topics. 

The essence of ABM practice in archaeology lies in formulating the individ- 
ual behaviors as choice rules, running the model, and comparing the simulation 
output to corresponding observations from empirical material. As archaeologists 
do not typically have access to knowledge about individual behaviors in the 
past, anthropological universals, contemporary analog, and other disciplines are 
used to define them. 


3 Studying settlement choice using ABM 
3.1 Case study: The Stone Age settlement of Estonia 
The research question for this case study was initiated by the notion among a 


group of Estonian and Finnish archaeologists that it is relatively easy to find 
settlement sites from the late Mesolithic Narva stage (5200-3900 BC) and early 


21 Steven Mithen and Melissa Reed, “Stepping Out: A Computer Simulation of Hominid Dis- 
persal from Africa,” Journal of Human Evolution 43, no. 4 (2002): 433-62; Iza Romanowska 
et al., “Dispersal and the Movius Line: Testing the Effect of Dispersal on Population Density 
Through Simulation,” Quaternary International 431 (2017): 53-63. 

22 Mark W. Lake, “MAGICAL Computer Simulation of Mesolithic Foraging,” in Dynamics in 
Human and Primate Societies: Agent-Based Modelling of Social and Spatial Processes, ed. Timo- 
thy Kohler and George Gumerman (Oxford: Oxford University Press, 2000), 107-43; Marco 
A. Janssen and Kim Hill, “An Agent-Based Model of Resource Distribution on Hunter-Gatherer 
Foraging Strategies: Clumped Habitats Favor Lower Mobility, but Result in Higher Foraging 
Returns,” in Simulating Prehistoric and Ancient Worlds, Computational Social Sciences, ed. Juan 
A. Barcelö and Florencia Del Castillo (Cham: Springer, 2016), 159-74. 

23 John H. Christiansen and Mark Altaweel, “Simulation of Natural and Social Process Inter- 
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Neolithic Comb Ware period (3900-1800 BC) (which we will refer to together as 
NCW) on the landscape, but that sites from the Corded Ware stage (CWC; 
2800-2000 BC) are only found by chance.” We can rephrase this by saying that 
archaeologists’ implicit mind-models can predict the locations of the first group 
of sites but are unsuccessful for the second group. 

The effectiveness of archaeological predictive models (here, we consider 
mind-models to belong to this group) has been thoroughly discussed and it has 
been hypothesized that, as social complexity grows, the direct relationship be- 
tween settlement choice and environmental conditions decreases.’ The case 
study presented here explored this hypothesis as a cause of differences in the 
environmental predictability of settlement locations. 

To do so, the research project integrated empirical data and theories of set- 
tlement pattern formation, including two levels of analysis. The empirical level 
was represented by the settlement locations of the given periods and the envi- 
ronmental conditions associated with those locations. Settlement systems can 
be approached as emergent phenomena formed by individuals making their de- 
cisions of where to live, which constitutes another theoretical level of analysis. 
Scholars have implicitly used this perspective but explicit approaches have 
been less explored so far. Separate levels of analysis and the complex nature of 
the formation process suggested ABM as an appropriate research tool to pro- 
pose hypothesized models of individual behavior and test them against empiri- 
cal observation. 

Using ABM set several requirements that needed to be met to build, calibrate, 
validate, and interpret an ABM model. Although ABM can be developed based on 
verbal theories” and validated qualitatively against descriptions, quantitative 
modeling steps were essential and considerably influenced the current research 
process. Those research steps created a research framework illustrated in Fig. 1 
and discussed in the following sections. 


27 Sikk et al., “Environment and Settlement,” 91. 

28 Jeffrey H. Altschul, “Models and the Modelling Process,” in Quantifying the Present and 
Predicting the Past: Theory, Method, and Application of Archaeological Predictive Modeling, ed. 
James Judge and Lynne Sebastian (Denver: US Department of the Interior, Bureau of Land 
Management, 1988), 61-96; Kenneth L. Kvamme, “There and Back Again: Revisiting Archaeo- 
logical Locational Modeling,” in GIS and Archaeological Site Location Modeling, ed. Mark 
W. Mehrer, and Konnie L. Wescott (Raton, FL: CRC Press, 2005), 23-55. 

29 Paul Smaldino, “How to Translate a Verbal Theory into a Formal Model,” preprint (MetaAr- 
Xiv, 26 May 2020). 


50 — Kaarel Sikk 


Theoretical domain 
i e 


Theory 
sources: ecology: economy: geography: archaeology 


Model interpretation 


not valid 


ABM modelling 


Hypotheseis Conceptualization 
& Development 


Experiments 
Che &model 
exploration 


ABM 
validation 


Calibration 


Empirical domain 


Statistical 
model 


Spatial 
model 


Data model: database 


I 
Data sources 
sources: archaeology: environment: geology 


Fig. 1: ABM-driven research process used in the current research. The flowchart illustrates the 
process starting from a hypothesis, through proposing a model, and ending with model 
interpretation which results in new theory building. 2020. © Kaarel Sikk. 


3.2 Data modeling 


Data extraction and modeling involve defining entities of interest and their avail- 
able and relevant characteristics.” Being a prerequisite for following modeling 
practices it does, however, require a knowledge of both empirical data and the 
related theoretical frameworks. In the current study, preliminary development of 
both the empirical and the conceptual model was required before the final data 
structure was decided upon, requiring synchronous development. 


30 The annual conference series “Computer Applications and Quantitative Methods in Ar- 
chaeology” dedicated to this topic has been running since 1973. For proceedings see: https:// 
caa-international.org/proceedings/published/, accessed July 27, 2021. 
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In addition to both empirical and theoretical explorations of available 
knowledge, data modeling also exposed three essential issues typical of quanti- 
tative studies of the past: 

- contemporary conditions (environment) are different from those of the past 

— the extent to which past environmental attractions can be observed in 
available variables is not known 

— whether the mind-model of the archaeologists has already introduced a 
bias in the current knowledge. 


To address these issues several new steps were introduced into the research 
process. The first of them required interdisciplinary cooperation with geologists 
who provided past landform and shoreline reconstruction models representing 
the periods of interest. 

The second issue was solved by constructing a statistical model which proved 
the strong relation between environmental variables (e.g., distance to water, soil 
type, geomorphological derivatives) and settlement choice (see Section 3.3). 

Data bias is a well-known issue to archaeologists” and, as one critical com- 
ment by a reviewer stated, it is often considered to invalidate the results, without 
delving into complex theoretical frameworks. In the current case the survey strate- 
gies were studied and it was found that most recent surveys have ventured past 
predicted areas and undertaken additional trips in order to validate knowledge.” 
This made the current knowledge significantly stronger, although awareness of 
possible bias is universally required during interpretation of archaeological results. 


3.3 Statistical model of empirical data 


A statistical model was created mainly as an evaluation of available environmental 
variables, to explore their relation to settlement choice. Statistical analysis was 
used to find and describe regularities in the empirical data. The dependence of set- 
tlement patterns on environmental conditions has been thoroughly researched in 
archaeology, with studies carried out since the 1970s. Later the exploration contin- 
ued mostly with GIS-based predictive models for archaeological site prospection.” 


31 David Wheatley, “Making Space for an Archaeology of Place,” Internet Archaeology 15 (2004). 
32 Sikk et al., “Environment and Settlement,” 110. 

33 William James Judge and Lynne Sebastian, Quantifying the Present and Predicting the Past: 
Theory, Method, and Application of Archaeological Predictive Modeling (Denver: US Department 
of the Interior, Bureau of Land Management, 1988); Mark W. Mehrer and Konnie L. Wescott, 
GIS and Archaeological Site Location Modeling (Raton, FL: CRC Press, 2005). 
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The analysis of the current data showed the existing relation between envi- 
ronment and settlement choice and exposed useful variables describing it. 
Some of the results, like the sites’ proximity to water bodies in dry, sandy areas 
were already known to archaeologists. The results added new insights includ- 
ing the rugged nature of the preferred environment and the relative position of 
sites in local topography. The statistical analysis served as a tool for data reduc- 
tion and helped to assess which variables were reflecting changes in settlement 
choice and were thus useful to include in further analysis. 

This step revealed differences in settlement choice logic between the CWC 
and NCW settlements, with the first being less constrained by water bodies and 
in general situated in higher locations.** The selection of variables (e.g., dis- 
tance to water, soil type) and measures of their effect on settlement choice 
pushed the boundaries of interpretation and led to alternative hypotheses for 
explaining the data. 


3.4 Spatial model 


A spatial model was constructed in order to quantitatively assess the initial obser- 
vation that the settlement choice of the CWC phenomenon was less predictable 
than that of the earlier periods. The analysis was done using methodologies from 
archaeological predictive modeling and eco-cultural niche modeling - the latter 
provided several additional measures and niche-related concepts from ecology. 

Knowledge of the relation of environmental variables to settlement choice was 
extrapolated to the whole research area by creating a spatial inductive logistic re- 
gression model. The resulting probability rasters represented the environmental 
residential suitability maps associated with the two studied settlement systems. 
Created models could be compared for both the environmental influences and the 
spatial configurations of suitable areas. Comparison of the features confirmed the 
hypothesis that during the CWC stage the settlement choice was less restricted by 
environmental conditions. 

Several spatial measures like spatial clustering and niche breadth were ex- 
perimented with and provided measures to compare simulation results to em- 
pirical reality, thus helping to validate them. Through the modeling process the 
epistemological meaning changed from economically evaluating individual lo- 
cations for potential archaeological remnants to reconstruction of the past 


34 Sikk et al., “Environment and Settlement,” 107-10. 
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vision of the landscape. The spatial interpretation of empirical data was a step 
closer to expressing individual perception of the landscape. 

In addition to confirming the initial hypothesis of decreasing environmental 
influence this enabled formulation of new interpretations of the importance of 
spatial structure of past perception of the habitation areas. For this, the new con- 
cept of a residential suitability model (RSM) was developed, which was inter- 
preted as the perceived potential of locations in an area for living and which is 
technically identical to niche models and archaeological predictive models. It 
could be asked: What are the differences between the suitable habitation areas, 
as perceived by people of the early Neolithic and the CWC cultures? It also helped 
to define hypotheses for explaining differing spatial structure RSMs of the settle- 
ment systems. Those hypotheses included the different mobility modes of the pe- 
riods, growing social complexity, and technological innovation making wider 
areas usable by agriculture. 


3.5 Agent-based framework 


The central goal of ABM is to explain complex relations between processes that 
are out of the reach of verbal arguments by proposing a model which can be 
validated to empirical data. The model can then be explored further, thus build- 
ing theory through interpreting it. For the current study the goal was to build a 
simulation model that produced synthetic data that could be compared to ar- 
chaeological empirical data and through it to explain the observed variations in 
settlement systems. The foundation of such a model is the conceptualization of 
theoretical knowledge of individual behaviors. 

The conceptual model was constructed to formally describe the settlement 
pattern formation process as cumulative settlement choices. The conceptualiza- 
tion drew from studies in ethnography and economic geography, incorporating 
abstract concepts most of which have previously been discussed in the context 
of archaeology. The conceptual model describing how people choose a place to 
live was based on theories from archaeology where most of the basic principles 
had been debated during the 1970s.” 

Constructing the abstract conceptual model was helped by the fact that gen- 
eral theories of settlement choice are similar in those fields and the main differen- 
ces come from the empirical data used to back them. For example, archaeologists 


35 Carole L. Crumley, “Three Locational Models: An Epistemological Assessment for Anthro- 
pology and Archaeology,” Advances in Archaeological Method and Theory 2 (1979): 141-73. 
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could categorize influences on choice as social (hypothetical) and environmental 
(partly observable) influences, but geographers would group influences by their 
spatial characteristics. 

Individual agents’ selection of residence was formulated using principles of 
discrete choice, with every location having an abstract utility value for a settle- 
ment. From contemporary experience we know that there exist a multitude of 
factors influencing residential choice, for example access to a workplace and es- 
sential services, social context, the feeling of belonging to a group, and general 
environmental conditions. Such factors depend on observed society, but to ab- 
stract prehistoric settlement choice we categorized them into two major groups: 
influences arising from the social domain (other people) and those related to the 
physical environment. 

The utility value of each location was then determined by a utility function 
composed of factors categorized as access to either ecosystem services or social 
services. Under ecosystem services we grouped factors like access to local shel- 
ter, drinking water, and a dry location, as well as access to fertile agricultural 
lands and hunting grounds. Social services include the benefits of keeping in 
contact with other people, including the availability of specialized goods, trade, 
and cultural and other benefits which can be associated with greater social com- 
plexity. It must be noted that neither group is completely nor directly observable 
in the archaeological record, but ecosystem services is certainly better repre- 
sented through environmental variables. 

A functional simulation model was constructed based on the conceptual 
model, and synthetic environments were generated, with ecosystem services 
and agent populations forming dynamic social attractions. Each agent in the 
system model represented a community that formed a residential settlement. 
Agents were made mobile and assigned a goal of searching for the best location 
in the randomly generated environments, using varying influences. 

One of the powers of ABM is the ease of going directly from the conceptual 
model to the simulation model, thus enabling model exploration techniques 
to be used to gain theoretical insights. Exploration of the conceptual model 
showed that, for settlement choice, the factors which required access over 
longer distances, like trade, were of lesser importance in validating the signif- 
icance of the relation between different environmental data for this decision. 
Although the result may be intuitive, ABM provided quantitative assessment 
of significant ranges of individual environmental influences. For example, 
local conditions influence specific location choice significantly more than ac- 
cess to resources in daily walking distance does. 
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3.6 ABM experiments 


Three extended models were created to run simulation experiments testing the 
hypotheses. The first experiment was designed and run to explore the resource 
depletion that has long been considered to be the driving force of hunter- 
gatherer mobility. A central place foraging (CPF) ABM implementation was cre- 
ated and illustrated that, although the resource depletion based model is very 
useful for explaining mobility, it has only a modest impact on settlement loca- 
tion choice principles.*° The experiment indicated that the hypothesis of differ- 
ing mobility was not the cause of differences between settlement systems of the 
periods concerned. 

Another simulation experiment was conducted testing different variations of 
utility function, with agents prioritizing either environmental benefits or social 
connectivity. As expected in simulation runs with agents prioritizing social serv- 
ices more highly, the environmental value of location was sacrificed, resulting in 
greater population clustering (Fig. 2) — and, in the reverse case, population was 
generally more dispersed.” The model confirmed the intuitive idea that, with 
greater social complexity, the selection of suitable sites was less environmentally 
determined - but it added a spatial factor: i.e. it should also result in higher pop- 
ulation clustering. 

Running a dedicated simulation experiment testing different spatial config- 
urations of the environment led to another unexpected insight. The simulations 
revealed the idea that the spatial autocorrelation of attractions in the landscape 
influences the emergence of settlement systems and population clustering. 

ABM enabled the conceptualization of the rather abstract but essential idea 
of a residential suitability model, as mentioned in Section 3.4. The most fruitful 
of the unexpected insights that came purely from ABM simulations was the un- 
derstanding that the spatial configuration of attractions in the landscape influ- 
ences the emergence of settlement systems and population. 

It was personally interesting to observe how the ABM modeling process sur- 
prised and played with the researchers’ intuition.*® The simulation results were 
sometimes the opposite of their initial intuition but, after visual observation of 
the simulations, previously counterintuitive results started to seem intuitive. I 


36 Sikk and Caruso, “Spatially Explicit Agent-Based Model,” 1-21. 

37 Kaarel Sikk, Geoffrey Caruso, and Aivar Kriiska, “Conceptual Framework of Assessing the 
Influence of Cultural Complexity to Settlement Pattern Formation,” (paper presented at Con- 
ference on Complex Systems, Thessaloniki, 2018). 

38 Andre Costopoulos and Mark W. Lake, Simulating Change: Archaeology into the Twenty- 
First Century (Salt Lake City: University of Utah Press, 2010). 
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experienced a similar situation while observing how smoother environments re- 
sulted in clustered populations and vice versa (Fig. 2), depending on the scale 
and importance of environmental variables. The intuition tricked researchers’ 
minds again when the resulting dynamics changed while introducing the me- 
chanics of resource depletion. 


4 Discussion on ABM and studies of the past 
4.1 ABM as a “thinkering” tool 


ABM is intended for exploring complex systems with emergent properties and 
other characteristic features. Some results of the current as well as other archaeo- 
logical papers can also be described using simpler analytical models. As using the 
simplest possible method is a general scientific principle,” a critique toward the 
use of ABM in archaeology is to ask: Is ABM really needed to confirm a theory? 

Experience in archaeology, including the current project, indicates that 
ABM has proven its value as a tool to “thinker” with,“° even if emergent prop- 
erties are not expected to be found. We argue that the process of developing 
ABM through formulating theories algorithmically is a very rewarding part of 
the research. Its unexpected additional knowledge gain often leads to new ap- 
proaches, concepts, and research questions.“ This benefit of modeling is es- 
pecially rewarding when dealing with the complexity of social systems, still 
relatively unexplored in humanities. 

The explorative power of ABM is realized through the development process 
and the methodological toolkit associated with it. In addition to domain knowl- 
edge of the subject matter, this development requires researchers to be able to 
express their ideas algorithmically — a formal expression that forces them to ex- 
plicitly state their knowledge and re-evaluate existing perspectives. It also 
opens up new angles to a research subject, with the challenge to select the most 
relevant one, thus requiring multiperspective exploration. 


39 Iza Romanowska, “So You Think You Can Model? A Guide to Building and Evaluating Ar- 
chaeological Simulation Models of Dispersals,” Human Biology 87, no. 3 (2015): 169-92. 

40 Erkki Huhtamo, “Thinkering with Media: On the Art of Paul DeMarinis,” in Paul DeMarini: 
Buried in Noise, ed. Ingrid Beirer et. al. (Heidelberg: Kehrer Verlag, 2010), 33-46; and Andreas 
Fickers and Tim van der Hejden, “Inside the Trading Zone: Thinkering in a Digital History 
Lab,” Digital Humanities Quarterly 14, no. 3 (2020). 

41 Costopoulos and Lake, “Simulating Change.” 
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Archaeologists have rather successfully developed a gut feeling for settle- 
ment locations from their experience of different landscapes. While searching for 
undiscovered settlement sites they use their mind-model empathetically: Where 
would I have camped or settled, in the past? The process is similar to agent- 
based modelers modeling the social system and describing the rules governing 
an individual (self) making a choice: “If I were to move it would (probably) be to 
a better place.” This very basic starting statement already raises several new 
questions that need to be solved and leads to a chain of “thinkering” exercises, 
experimenting with the synergy of empathetic and rule-based thinking. 

If familiar with the algorithmic toolkit, ABM provides the researcher with a 
surprisingly intuitive process giving reflexive feedback and new perspectives 
on existing knowledge. These perspectives often lead to reconceptualizations of 
subject matter. In the current research a significant development was the recon- 
ceptualization of archaeological predictive models as residential suitability 
models. 


4.2 ABMas an interdisciplinary trading ground 


The research process showed that the skeptical view that modeling practices sup- 
press multiperspective approaches was unfounded - and that in fact the opposite 
was true: describing influences on human choices required searching for new per- 
spectives in order to describe the system as a whole in the most effective way. 

Using ABM almost universally forces the researcher to enter an interdisci- 
plinary trading ground and search for fields where specific problems have been 
solved in the most efficient way. A sentence in Section 4.1 reflected settlement 
choice as perceived by a person: questions on how to formally describe a choice 
can be studied in anthropology, psychology, or economics, as archaeologists 
often do. Although the developed settlement choice model was focused on spa- 
tial aspects it required mapping a wide range of literature from different sources 
and used input from various different domains, including geology and ecology. 

Formal models are descriptions of a phenomenon with all irrelevant domain 
knowledge stripped out, which makes it possible for specialists from different 
fields to understand and evaluate the model and reproduce the research results. 
As a visual diagram can generally be read without knowing the scientific details 
of a topic, so an ABM can also be read and understood by anyone who has mas- 
tered the language of its development. This makes formal models efficient inter- 
disciplinary communication tools. 

In the case of ecology and archaeology, for example, there has been an 
exchange regarding predictive models of animal niches and archaeological 
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settlement sites. Despite these being different domains the literature is easily 

understandable by researchers — and joint methodological developments 

have even led to the new field of eco-cultural niche modeling.’ In the current 

research, geological paleoreconstruction models were directly usable as a di- 

rect input to archaeological models. 
Following are some of the interdisciplinary points of contact that were com- 

municated through modeling practices: 

— archaeological and environmental data, through data modeling 

— a paleoenvironment reconstruction model, with geologists using GISs and 
existing paleoenvironmental proxies 

— spatial statistics, with inductive spatial models and geographical tools to 
compare them 

— conceptual ABM integrating theoretical frameworks from economics, urban 
geography, and ethnography 

- model exploration techniques for assessing and building theories and inter- 
preting empirical data. 


The techniques used in the current study also have surprising connections to 
very different fields. For example, inductive models used for predicting site lo- 
cations are algorithmically identical to the ones used for text analysis, such as 
for topic modeling. They even share typical prediction algorithms (e.g., logistic 
regression and MaxEnt) and similarly produce a classification (e.g., habitation 
suitability versus topic) that can be used by scholars for searching (e.g., new 
sites versus new insights). These models add a new dimension of observation: 
in the case of topic modeling this might, for example, create a temporal dy- 
namic description and in the case of archaeological sites the model can provide 
the spatial structure of a suitable area. Although having very different fields of 
research, scholars working with the same algorithm can create a surprisingly 
effective channel of communication. 

ABM provides even more potential trading ground in the humanities. The 
individual-based approach enables more abstract models to be extended to rep- 
resent particular cases in different fields. So a conceptual model of residential 
choice could be extended to represent hunter-gatherers on the landscape, or 
people living in early towns, but also global processes of immigration. 


42 William E. Banks et al., “Eco-Cultural Niche Modeling: New Tools for Reconstructing the 
Geography and Ecology of Past Human Populations,” PaleoAnthropology 4, no. 6 (2006): 
68-83. 
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5 Conclusion: Remarks on the general usability 
of ABM for exploring the past 


Systems modeling can be applied in cases where generalizations are relevant. 
When exploring an individual biography, or a narrative with no regularities, 
ABM practices might not contribute. But ABM can be applied when two levels 
of analysis — such as individuals and social groups — are included in the re- 
search. Navigating between these is quite intuitive in everyday life, for example 
when talking about individuals and their stories we tend to see them as unique, 
but when considering a person’s social role we classify and generalize. 

Because of its more general level of usage, ABM’s potential use in humani- 
ties might have similarities with prosopography, the study of common features 
in historical social groups, as it is not in a constant search for the exceptional 
and unique. 

In archaeology, ABM, among other modeling techniques, has seen consider- 
able success. This may be explained by the discipline’s close relation to natural 
sciences and the pattern-like nature of archaeological data. It is also relevant that 
the data collection procedure used in excavation is a quantitative process. The ar- 
chaeological record is organized by units of different scale like region, site, archae- 
ological context, and artifact. The essential element of archaeology, the dating of 
items and contexts, traditionally uses the stratigraphical method borrowed from 
geology and has developed its own statistical methods, from seriation (introduced 
by Petrie) to radiocarbon dating interpreted through Bayesian statistics. 

This indicates that a successful ABM project depends on proven formal 
frameworks and sufficient amounts of quantitative data, collected in a system- 
atic fashion, so as to serve as a proxy for studied phenomena. Additionally, the 
observed sociocultural processes must be of sufficient scale to generate regular- 
ities that can be isolated from chaotic or unobservable randomness. 
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Shohreh Haddadan 
Argument structures of political debates 
Annotation, extraction, and applications 


1 Introduction 


Argument mining has been a popular application in natural language process- 
ing (NLP) in recent years. Finding the structure of an argument from unstruc- 
tured resources facilitates the analysis of the huge amounts of data that are 
available in these modern times, whether born-digital web content or digitized 
resources processable by machines. One of the potential fields of argument is 
that of the political debates in which candidates argue adversarially over topics 
put to them, in order to persuade the audience of their competence to be ap- 
pointed by them to a post. Presidential election debates in the United States 
have, in some cases, been proven effective in this persuasion. 

In this study I am interested in the algorithmic extraction of the argument 
structures in these debates. In Section 2 of this chapter, I discuss why this 
study is considered an interdisciplinary field and how the study and its results 
could relate to digital history and hermeneutics Section 3 investigates in detail 
the need for annotation in digital humanity studies and the annotation process 
approach I took in this research. In Section 4 I explain how, in my research, I 
have implemented NLP algorithms to extract the argument structures from a 
political debate dataset and evaluate the results — while in Section 5 I describe 
some applications of the extracted argument structures. 

Finally, in Section 6, I critically reflect on the transformation of the digital 
data source as well as the methodology, including the annotation process and 
NLP techniques used in this research. 


1.1 Research goal and questions 
In this study, I focus firstly on annotation and then extraction of the logical 


structure of the arguments provided by presidential candidates in the US presi- 
dential election debates from 1960 to 2016. 


Acknowledgements: | would like to thank Prof. Dr. Christof Schéch, University of Trier (GER) 
and Dr. Lincoln Mullen, Roy Rosenzweig Center for History and New Media, Virginia (USA) for 
their insightful feedback on earlier drafts of this paper, and the Luxembourg National Re- 
search Fund (FNR) (10929115), who funded my research. 


3 Open Access. © 2022 Shohreh Haddadan, published by De Gruyter. |G) EAA] This work is licensed 
under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. 
https://doi.org/10.1515/9783110723991-004 


66 — Shohreh Haddadan 


Most prominently in argument mining research, an argument’s main com- 
ponent is a claim which embeds the goal of the argument. Thereafter the claim 
needs to be supported by evidence or premises. 

The main goal of my research is to algorithmically identify the argument struc- 
ture in political debate data - i.e. to find an algorithm which can identify how the 
argument is shaped, ultimately achieving an argument structure such as that de- 
picted in Fig. 1, which is based on a statement made by Senator John F. Kennedy 
in the 26 September 1960 debate against Vice President Richard Nixon. 


In my judgement, the hard money, tight 
money policy, fiscal policy of this 


administration has contributed to the 
slow-down in our economy 


which has slowed somewhat 
our economic activity in 
1960 


which helped bring the which made the recession of 
recession of fifty-four fifty-eight rather intense 


Fig. 1: Argument structure extracted from Kennedy-Nixon debate on 26 September 1960. 
KENNEDY: In my judgment, the hard money, tight money policy, fiscal policy of this 
Administration has contributed to the slow-down in our economy, which helped bring the 
recession of fifty-four; which made the recession of fifty-eight rather intense, and which has 
slowed, somewhat, our economic activity in 1960. 2020. © Shohreh Haddadan. 


One approach toward extraction of structures of this type from plain textual re- 
sources is to implement an argument mining pipeline - this is considered the 
main methodology in this field of research. 

In my research, I focus on answering the following questions: 

- How can political debate transcript data be defined in the argument analy- 
sis domain? 

— Is an artificial intelligent agent in the form of a computer program able to re- 
produce the thought processes of a human in structuring an argument, given 
the political debate text data? And what aspects of argument structure should 
an algorithm learn in order to reshape a political debate transcript into argu- 
ment structures illustrating how candidates formulate their arguments in 
debates? 

— What means of analysis can argument structures extracted from text pro- 
vide for media historians, political scientists, or social scientists? 
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2 Interdisciplinary aspect of the project 


Argument mining research is by definition an interdisciplinary study. The two 
disciplines from which this area of research has originated are argumentation 
theory and NLP. Argumentation theory analyses the nature of arguments from 
a logical perspective. The basis of this research field is the study of argument 
structures. Meticulously speaking, argumentation theory is in itself an interdis- 
ciplinary field composed of rhetoric and logic.' 

Furthermore, the research applies NLP techniques in order to extract what 
we define as arguments from language resources which potentially contain 
arguments. 

Thirdly, the dataset determines the field(s) of science in which the results 
are interpreted, in this case contemporary history, political science, and public 
discourse. Zarefsky describes the study of public discourse as a “subfield [that] 
has developed [. . .] that is devoted to the historical-critical study of specific 
texts or moments of rhetorical significance,” - moments, for example, such as 
those of presidential debates. As will be discussed in Section 3, the dataset at 
hand is a collection of debates from the election periods in the United States 
between 1960 and 2016. The interpretation of this dataset from the perspective 
of how the arguments have been shaped, changed, reformed, and evolved dur- 
ing a relatively short period of time falls into the field of history, as well as that 
of political sciences. Thus, the interdisciplinary aspect of this research project 
touches on three main fields at various levels of basic definition, methodology 
and practical interpretation. 

Zarefsky points out that the goal of studying public discourse is to redirect it 
firstly into what he calls artistic goals, where scholars investigate the dynamics 
of the text to further evaluate its effectiveness or persuasiveness, which can in 
argumentation theory be mapped to evaluating the strength of the arguments. 
The second aspect is that of historical goals, which aim at understanding how a 
certain public discourse affected the flow of history.” This goal can be mapped to 
evaluating, from a distant reading view, how different topics have been structur- 
ally formed within arguments in the debates throughout a time line. 


1 Manfred Stede and Jodi Schneider, “Argumentation mining,” Synthesis Lectures on Human 
Language Technologies 11, no. 2 (2018): 1-191. 

2 David Zarefsky, Political argumentation in the United States: Historical and contemporary 
studies. Selected essays by David Zarefsky (Amsterdam: John Benjamins Publishing, 2014), 2. 

3 David Zarefsky, “Political argumentation,” 2. 
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We therefore expect that the extraction of the structure of arguments can 
indeed provide a tool for historians and public discourse analysts to facilitate 
both types of goals mentioned. 


3 Annotation 


Annotation bridges various fields of science by adding metadata or knowledge 
from other perspectives to plain data (e.g. text, images). NLP profits from many 
linguistic annotation schemes applied to plain textual resources, such as anno- 
tation of parts of speech, various parsing syntaxes, and even semantic and 
pragmatic level annotations. 

Furthermore, the field of digital humanities benefits from annotated resources 
arising from mark-up annotation schemes such as reviews of cultural artifacts.’ It 
also connects computational linguistics with the field of digital humanities — for 
instance, Schmidt uses semantic annotation of named entities to facilitate a histor- 
ical study of German plays in order to interpret their role in historical narratives.° 

New machine learning techniques like deep learning are known for their 
data-devouring characteristics. Annotation is the means of providing them with 
the data they need to consume to enable them to recognize patterns using their 
generalization and specification algorithmic mechanisms. 

In this research, I first had to specify an annotation scheme to represent 
argument structures so that it could be applied to the data. 

The structure of arguments may vary depending on the domain in question. 
For instance, in an argument essay, students describe their stance for or against 
a predefined issue (referred to as the major claim) to outline their thought pro- 
cesses and structure them for the teacher or the readers of the article. However, 
televised debates for the presidential elections take place in a competitive atmo- 
sphere, with arguments being made for self-promotion purposes in this adversar- 
ial context. The debaters support their claims in each monological argument and 
other debaters attack (or in rare cases support) those arguments in a dialogical 


4 Kristin Kutzner et al., “Reviews of Cultural Artefacts: Towards a Schema for their Annota- 
tion,” (Workshop on Annotation in Digital Humanities conference co-located with ESSLLI, 
Sofia, 2018), 17-23. 

5 Thomas Schmidt, Manuel Burghardt, and Katrin Dennerlein. “Sentiment annotation of his- 
toric German plays: An empirical study on annotation behavior,” (Annotation in Digital Hu- 
manities conference, Sofia, 2018), 47-52. 
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setting. Therefore, the annotation scheme used will also vary according to the 
setting of the argumentation. 

I discuss selection of the annotation scheme from three aspects: reproduc- 
ibility, practicality, and refinement. 

An annotation scheme needs to be reproducible independent of the annota- 
tor and the annotation platform. In order to bring about reproducibility in any 
annotation scheme on a language corpus, measuring methods are proposed 
during the annotation process. 

The practicality of the annotation denotes the applicability of the extracted 
structure in the required domains. For instance, the persuasive essay project 
has defined an argument scheme which incorporates “major claim” as an argu- 
ment component since, as I discuss later, in most cases there are no explicit 
major claims in dialogical arguments in our political debate dataset.° 

The process of interchangeably converting one annotation scheme to an- 
other, in order to make use of various datasets as inputs to reasoning engines, 
can be handled if the annotation scheme has the capacity for refinement. A sim- 
ple annotation scheme may later be expanded on matters such as the distinc- 
tions between relations, or the classification of components. 

These three aspects guide us in selecting an appropriate annotation scheme 
while considering the trade-off between a scheme simple enough for annotators 
and the incorporation of enough information for further practical purposes. 

The selected annotation scheme is composed of two classes of argument 
component — namely claims and premises — and two classes of relations which 
connect the argument components to form the structure of the argument’s (sup- 
port/attack) relations. This annotation scheme can serve in the monological 
speeches made by each candidate in the debates to represent the structure of 
their argument as shown in the argument structure diagram in Fig. 1. 

Moreover, the relations can be extended to depict supporting or attacking 
arguments in candidates’ speeches in the dialogical setting of political debates. 
Figure 2 shows an argument diagram extracted from the annotated dataset. 
Each monologue speech is annotated with argument components and relations 
from the argument structure. Relations depict the relations between argument 
components in a dialogical setting. 


6 Christian Stab and Iryna Gurevych, “Annotating argument components and relations in per- 
suasive essays,” (COLING, the 25th international conference on computational linguistics: 
Technical papers, Dublin, 2014), 1501-10. 
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Claim 1 


I’ve opposed the death 
penalty duting all of my 
life 
Li 


Premise 1 Claim 2 


I don’t see any evidence I think there are better and 
that it’s a deterrent 


Dukakis T 


more effective ways to deal 
with violent crime 


j 


Premise 2 


We’ve done so in my own 
state 


see ee 


Erenilse3 Premise 4 


it’s one of the reasons why we have 
had the biggest drop in crime of 
any industrial state in America 


why we have the lowest murder rate 
of any industrial state in America 


Claim3 


| believe we need it 


ie 


` Claim1 
Sr for those real brutal crimes, 
I think it is a deterrent! I do believe in the death 
Bush penalty 
Premise 1 


I do believe that some crimes are so heinous, so 
brutal, so outrageous, and I’d say particularly those 
that result in the death of a police officer 


Fig. 2: Argument structure diagram using a claim-premise annotation scheme. Claims are 
depicted in white rectangles; shaded rectangles represent premises. Arrow-headed 
connectors represent support relations; circle-headed dotted connectors show attack 
relations. 2020. © Shohreh Haddadan. 


3.1 Dataset 


Data was gathered from the website of the Commission on Presidential Debates 
(CPD),’which provides transcriptions of the debates held among the leading 


7 “The Commission on Presidential Debates,” accessed July 26, 2021. https://www.debates.org. 
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candidates for the presidential and vice-presidential offices in the US. This or- 
ganization is a non-profit which has been responsible for regulating the debates 
leading up to the US presidential elections since 1987. The website also con- 
tains transcripts of debates held earlier than the establishment of the CPD. For 
this study, transcripts of the televised debates which were broadcast from 1960, 
between Kennedy and Nixon (the earliest such debate), until 2016 between 
Clinton and Trump, were selected. The dataset consists of 42 transcripts be- 
tween major party candidates, divided in 12 different election years. In the 
years 1964, 1968, and 1972, no debates were held between the major candidates 
from the Republican and Democratic parties. 

Table 1 summarizes the size of the dataset with respect to the number of 
speech turns, sentences, and tokens during all debates in each year of the pres- 
idential elections. This dataset has significant features such as its size (well 
over 6,000 turns, over 38,000 sentences, and nearly 690,000 tokens), its pecu- 
liar nature of containing reciprocal discussions, and its time line structure. 


Tab. 1: Raw dataset of transcripts, number of turns, sentences and tokens in the dataset. 
2020. © Shohreh Haddadan. 


Year Types Candidates Tums No Gent No Token No 
1960 Apres Kennedy - Nixon 257 2,313 48,445 
1976 3 pres Carter - Ford 270 2,090 46,583 
1980 2pres Anderson - Carter - Reagan 201 1,247 28,775 
1984 2 pres + 1 vice Mondale - Reagan 362 2,605 49,574 
1988 2 pres + 1 vice Bush - Dukakis 491 2,828 53,202 
1992 3 pres + 1 vice Bush - Clinton — Perot 928 4,713 78,878 
1996 2 pres + 1 vice Clinton - Dole 280 2,381 39,090 
2000 3 pres +1 vice Bush - Gore 564 3,331 55,320 
2004 3 pres + 1 vice Bush - Kerry 598 4,806 78,310 
2008 3 pres +1 vice McCain - Obama 669 3,849 76,591 
2012 3 pres + 1 vice Obama - Romney 1,102 4,997 82,921 
2016 3 pres + 1 vice Clinton - Trump 944 3,171 50,565 


Total 33 pres + 9 vice 6,666 38,331 688,254 
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3.2 Annotation tool 


For this study I chose the brat annotation tool. This is an open-source web- 
based tool which provides functionality for annotating text collaboratively. Brat 
is a platform in which text segments can be annotated at character level - it is 
thus applicable for using as an annotation platform where annotation bound- 
aries are not limited to sentences (Fig. 3). 

In order to facilitate setting up annotation at the workstations of several an- 
notators, brat provides a server code snippet. To configure the brat server, the 
annotation manager defines the annotations for entities, events, and relations, 
depending on the annotation scheme. For this annotation task, the brat annota- 
tion standalone server software was set up on the university domain.® 

The mark-ups by annotators are saved in a text file formatted with a spe- 
cific extension for annotated files: ann. The annotation identification number, 
the offsets of the beginning and end of the text segments, and the annotation 
labels are written to files saved on the server file system in a standoff format. 


And Iraq is not even the center of the focus of the war on terror. 

The center is Afghanistan, where, incidentally, 

there were more Americans killed last year than the year before; 
Premise 

where the opium production is 75 percent of the world's opium production; 
Premise 

where 40 to 60 percent of the economy of Afghanistan is based on opium; 


where the elections have been postponed three times. 


Fig. 3: Mock-up of a text segment from the dataset, annotated with premises and claims. 
2020. © Shohreh Haddadan. 


8 Pontus Stenetorp et al. “BRAT: a web-based tool for NLP-assisted text annotation.” (Dem- 
onstrations at the 13th Conference of the European Chapter of the Association for Computa- 
tional Linguistics, Avignon, 2012), 102-7; Brat annotation tool: brat.uni.lu:8001. 
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3.3 Annotation cycle 


The annotation process for the dataset of my research was divided into two 
stages. 

In the first stage, three non-expert annotators annotated the dataset with 
component arguments using the brat annotation software, which is a web- 
based annotation tool. 

In this section I describe how I carried out the annotation of the dataset of 
US presidential election debate transcripts from 1960 to 2016. 

I devised annotation guidelines for three non-expert annotators to perform 
the annotation task. The guidelines described the annotation scheme in which 
arguments consist of argumentative discourse units, classified as claims and 
premises. In each annotation cycle (Fig. 4) I evaluated reproducibility based on 
qualitative and quantitative measures to improve the annotation. The qualitative 
analysis included looking at the disagreements of annotators on the same data, 
in order to improve the annotation guidelines,’ and for the quantitative analysis 
I computed the inter-annotator agreement based on the average of Cohen’s 
kappa between each pair of two annotators.'? 

The annotation cycle consists of following the stages of the guidelines in 
annotating the argument components.” In further studies, the guidelines were 
developed to add the annotation of relations between the components. 

The annotation scheme adopted in my research considers the argumenta- 
tive discourse units (ADUs), their distinction as claims or premises, and the re- 
lations between them which form the structure of the arguments. Relations are 
further classified into support and attack relations. 

Limitations were set — for example, each ADU could either be a claim or a 
premise but not both, and no more than one outgoing relation from each com- 
ponent could be valid. In order for the argument to be structured, the annota- 
tion scheme also used relations in either support or attack form between the 


9 Milagro Teruel et al., “ Increasing argument annotation reproducibility by using inter- 
annotator agreement to improve guidelines.” (Eleventh International Conference on Language 
Resources and Evaluation, LREC, Miyazaki, Japan, 2018), 4061-64. 

10 Ron Artstein and Massimo Poesio. “Inter-coder agreement for computational linguistics.” 
Computational Linguistics 34, no. 4 (2008): 555-96. 

11 Shohreh Haddadan, Elena Cabrio, and Serena Villata. “Annotation of argument compo- 
nents in political debates data.” (Workshop on Annotation in Digital Humanities conference 
co-located with ESSLLI, Sofia, 2018), 12-6. 
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argumentative utterances. In the following, I mention some examples from the 
guidelines on how to identify the annotation concepts." 

The main purpose of an argument is to derive a conclusion or justify a 
claim. In political debates, claims are uttered for the purpose of defending a 
policy that a candidate or their party advocates, or a stance for or against a con- 
troversial subject, or even personal judgments. 

Example A is one the many cases where the candidate is defending a policy 
of their own. Claims of this type also include supporting policies of the admin- 
istration the candidates are associated with or claims against the policy their 
opponent is representing. 


Example A: Bush - Kerry, 30 September 2004 

BUSH: My administration started what’s called the Proliferation Security Initiative. Over 60 na- 
tions involved with disrupting the trans-shipment of information and/or weapons of mass de- 
struction materials. And we’ve been effective. We busted the A.Q. Khan network. This was a 
proliferator out of Pakistan that was selling secrets to places like North Korea and Libya. We 
convinced Libya to disarm. 


Taking a stance toward a controversial subject, or expressing an opinion to- 
ward a specific issue is also considered as a claim. In example B, Dukakis op- 
poses the death penalty, a controversial topic in US presidential elections. 


Example B: Bush - Dukakis, 13 October 1988 
DUKAKIS: . . . ’ve opposed the death penalty during all of my life. / don’t see any evidence that 
it’s a deterrent and I think there are better and more effective ways to deal with violent crime. 


In some cases, the explicit choice of expressions indicates the nature of the argu- 
ments. A useful clue for identifying claims in speeches is to find some indicators 
which are usually exploited to state opinions or judgments, or to form a conclusion 
such as “I think,” “in my judgment” and “in my opinion.” However, the presence 
of these expressions does not guarantee the presence of a claim. On the contrary, 
the candidates do not necessarily use these indicators to assert their claims: in ex- 
ample B, “I think” is used in expressing a premise rather than a claim. 

Premises are utterances asserted by the debaters to back up their claims. A 
premise is a reason or justification for a claim. One type of premise consistently 


used by candidates contains references to the past: more experienced candidates 


12 In the examples, claims are marked in bold, premises in italics, and the component bound- 
aries by [square brackets]. 
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occasionally exploit this factor to argue that their claims are more relevant, given 
their expertise, than their opponents (example C illustrates this kind of premise). 


Example C: Carter - Ford, 23 September 1976 

CARTER: Well among my other experiences in the past, I’ve - I’ve been a nuclear engineer, and 
did graduate work in this field. | think | know the — the uh capabilities and limitations of 
atomic power. 


Statistics are very commonly used as evidence for the justification of claims, as 
in example D. 


Example D: Clinton - Dole, 6 October 1996 

CLINTON: We have the biggest drop in the number of people in poverty in 27 years... . The 
average family’s income has gone up over $1,600 just since our economic plan passed. So 
I think it’s clear that were better off than we were four years ago. 


Premises may be asserted in the form of examples to prove that a claim is justi- 
fied, as seen in example E. 


Example E: Carter - Ford, 6 October 1976 

FORD: I believe that we have uh - negotiated with the Soviet Union since I’ve been president 
from a position of strength. And let me cite several examples. Shortly after | became president 
in uh — December of 1974, | met with uh — General Secretary Brezhnev in Vladivostok and we 
agreed to a mutual cap on the ballistic missile launchers at a ceiling of twenty-four hundred. . . 


Premises may also be accompanied by indicators which help detect the exem- 
plification and justification of a claim. Some of these indicators are “because,” 
“since,” and “that’s why.” 


3.4 Annotation results 


In order to compute inter-annotator agreement as the qualitative measure of 
the reproducibility of the annotated dataset, 19 of the debate transcriptions 
were annotated by two of the annotators and their agreement was reported. 

Observed agreement of annotators on whether a sentence contained an ar- 
gumentative segment or not was 83%; based on Cohen, kappa was x = 0.57; so 
this is considered a moderate agreement. Agreement of an average kappa coef- 
ficient of x = 0.4 (fair agreement) for the argument components indicates 
whether an argumentative unit is a claim or a premise. 
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Reading 
Guidelines 


Revision Annotation 


Evaluation 


Fig. 4: The annotation cycle depicting stages of annotation until a fair agreement is reached. 
2018. © Haddadan, Shohreh et al. “Annotation of argument components”, 14. 


3.5 Annotation challenges inherent to the dataset 


Observing the inter-annotator agreement on argument components, I discuss 
the sources of disagreement during the annotation cycle. Uncovering the sour- 
ces of disagreement between annotators in the early stages facilitates the revi- 
sion of the guidelines for further repetitions of the annotation cycle, and also 
allows for easier refinements later on. 


Context-based claims: The task of identifying premises and claims in public 
discourse is highly subjective, which also results in a high disagreement 
percentage in the annotation of the argument components. Consider exam- 
ple F, for instance: the phrase “Communism is the enemy of all religions” is 
provided to support the claim for why “we who do believe in God must join 
together.” Although there is no justification as to why that is a true state- 
ment and Nixon uses it as a premise. 


Example F: Nixon — Kennedy, 13 October 1960 
NIXON: Communism is the enemy of all religions; and we who do believe in God must join 
together. We must not be divided on this issue. 


Implicit claims: Claims are sometimes made implicitly. In example G, Nixon 
states that “it would be rather difficult” to cover his proposals in a short time, 
which implicitly indicates that he has a lot of relevant experience. After this 
he mentions a few of his travels abroad during his vice presidency, however 
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the premises he uses are not related to an explicit claim of “I have sufficient 
relevant experience.” 


Example G: Kennedy - Nixon, 26 September 1960 

NOVINS: Would you tell us please specifically what major proposals you have made in the last 
eight years that have been adopted by the administration? 

NIXON: It would be rather difficult to cover them in eight and - in two and a half minutes. | 
would suggest that these proposals could be mentioned. 


— Absence of major claims: In general, the arguments do not have any major 
claims. In the few cases when a controversial issue — such as the death pen- 
alty, legalization of abortion, or gun control - is being discussed, when a 
major claim can be identified it can be distinguished from the question 
being asked by the moderator. 

— Macro relations: Since I chose a micro-level annotation scheme rather than 
a macro- level annotation one, some of the relations annotations could be 
lost in the annotation process. An argument component can attack a com- 
plete argument made previously by another candidate for which a single 
component cannot be specified as being related. Menini et al. annotate the 
relation between two separate monologue debates as supporting or attacking 
each other. However, the annotation scheme cannot capture the relation of 
the following statement (example H) with a specific argument component 
from the speech of the other candidate. 


Example H: Reagan — Mondale, 21 October 1984 
REAGAN: I’m not going to continue trying to respond to these repetitions of the falsehoods 
that have already been stated here. 


— Relation spans: The length of each speech turn somehow made it challeng- 
ing to identify the argumentative units and the relations across these com- 
ponents. In order to overcome this challenge, I divided each debate session 
into sections, at the turn of the subject initiated by the moderator. 


13 Stefano Menini et al., “Never retreat, never retract: Argumentation analysis for political 
speeches.” (324 AAAI Conference on Artificial Intelligence, New Orleans, USA, 2018), 4889-96. 
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3.6 Annotation refinement 


The steps described above resulted in an annotated dataset that fulfilled the re- 
quirements of basic argument structures. However, for further development of 
the dataset I suggest some refinements which can be implemented with regard 
to the challenges mentioned in Section 3.5. 

Each speech from a candidate can be regarded as a macro-level argument, 
for which the later arguments may be generally supporting or attacking, or 
even neutral. 

Each section of the debate, between which the moderator changes the sub- 
ject, can be identified with a major claim. The major claim may pertain to the 
question asked or the summary of the argument taking place. 

One other aspect I took into account in choosing my annotation scheme 
was how straightforward it would be to transform the scheme. One such type of 
transformation is to break down the higher-level annotation labels into finer 
concepts. Contrary to the micro-text scheme used by Peldszus and Stede,'* I 
made no distinction between different types of attack, but this distinction can 
be added later as a refinement step to the annotation process. 

In my chosen annotation scheme, the annotation of components was limited 
to the identification of argumentative versus non-argumentative utterances. Sub- 
sequently, it is possible to mark a distinction between which argumentative ut- 
terances are put forward as claims or conclusions, and which are put forward as 
evidence which embodies the premises of the arguments. Classification of claims 
as epistemic, practical, or moral — and premises as study, expert, or anecdotal — 
can also further be applied to the annotation.” 


4 Argument mining pipeline 


As mentioned before, argument mining is the extraction of argument structures 
from argument resources. One of the most prominent frameworks for argument 
mining is to deconstruct the methodology into stages and bring these together 
as a pipeline. 


14 Andreas Peldszus and Manfred Stede, “From argument diagrams to argumentation mining 
in texts: A survey,” International Journal of Cognitive Informatics and Natural Intelligence OT. 
CINI) 7, 1 (2013): 1-31. 

15 Marco Lippi and Paolo Torroni, “Argument Mining from Speech: Detecting Claims in Politi- 
cal Debates,” (30 AAAI Conference on Artificial Intelligence, Phoenix, USA, 2016), 2979-85. 
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These stages include: 

- identification of argument boundaries (distinction between argumentative 
vs. non-argumentative utterances) 

- classification of argumentative utterances into component types (which, in 
the selected annotation scheme for this study, include claims and premises) 

- reconstruction of the structure of the argument from plain text resources (in 
this research) by identifying the relations between the argument components.”° 


The first two stages come together as component detection and the last stage 
above includes the argument structure prediction. Each stage is fed by the out- 
put of the previous stage, with the annotated dataset used as the input for the 
first stage. 

In this research, multiple NLP methods were applied” for each stage of the 
pipeline. 


4.1 Component identification and detection 


The boundary detection problem can be viewed from two perspectives. Relaxing 
the boundaries and confining them in whole sentences would reduce the problem 
of boundary detection to the classification of sentences as argumentative or non- 
argumentative. On the other hand, there are certain motivations for considering 
the boundary detection problem not on a sentence level but at a token-based level. 
Firstly, the dataset is a transcribed dialogue, which alters the concept of a sentence 
with respect to how it would be transcribed and edited later. Secondly, in previous 
studies, supporting or attacking argument components have been defined inside 
the boundaries of one sentence, and in some cases there have been correspond- 
ences between argument relations and discourse analysis which might also occur 
inside the boundaries of sentences,'”such as in example I. Finally, there are a few 


16 Marco Lippi and Paolo Torroni, “Argumentation mining: State of the art and emerging 
trends.” ACM Transactions on Internet Technology (TOIT) 16,2 (2016): 1-25. 

17 Shohreh Haddadan, Elena Cabrio, and Serena Villata, “Yes, we can! Mining Arguments in 
50 Years of US Presidential Campaign Debates,” (57th Annual Meeting of the Association for 
Computational Linguistics, Florence, Italy, 2019), 4684-90. 

18 Christian Stab and Iryna Gurevych, “Parsing argumentation structures in persuasive es- 
says.” Computational Linguistics 43, 3 (2017): 619-59. 

19 Elena Cabrio, Sara Tonelli, and Serena Villata, “From Discourse Analysis to Argumentation 
Schemes and Back: Relations and Differences,” in Computational Logic in Multi-Agent Systems, 
ed. Joäo Leite et al., CLIMA 2013. Lecture Notes in Computer Science, vol 8143. (Berlin: Springer, 
2010), 1-17. 
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cases in the annotated dataset where the boundaries of one argument exceed the 
limits of what is identified as a sentence. Example J, for instance, is an example of 
how a component crosses the boundaries of a so-called sentence. 


Example I: Bush - Gore, 11 October 2000 
GORE: I think states should do that for new handguns, because too many criminals are getting 
guns 


Example J: Obama — McCain, 7 October 2008 
McCain: - at the diminished value of those homes and let people be able to make those - be 
able to make those payments and stay in their homes. /s it expensive? Yes. 


The component classification, followed by either a sentence-level or token-level 
approach argument boundary detection, is carried out as a text classification 
task. 

I implemented several methods to detect sentence-based and token-based 
component boundaries. In this section, I focus on just one of the applied super- 
vised machine learning methods used in NLP applications to classify text based 
on extracted features - this one being the support vector machine (SVM). 

Statistical machine learning methods — as opposed to rule-based methods, 
which define straightforward rules to identify a pattern in text - make use of sta- 
tistical and mathematical methods to extract patterns from text and generalize 
these patterns onto text which they have not previously observed. In supervised 
machine learning methods, unlike in unsupervised methods, the training data 
are already annotated with the target classes (in the case of this research: argu- 
mentative vs. non-argumentative sentences and claim vs. premise sentences). 

The first step in using this method is to transform the text sentences into 
vectors of features. A set of features - including lexical ones such as frequency 
of words, importance of a term in a document (based on the tf-idf measure), 
and n-grams, and linguistic ones such as parts of speech, syntax of sentences, 
and also some features pertaining to the indicators of components - is ex- 
tracted and applied for classification. 

In order to apply the SVM method to our data, I used a Python-implemented 
library called scikit-learn,”° firstly to transform the extracted features into numer- 
ical vectors (vectorization) and then to train the SVM learner on the annotated 


20 Fabian Pedregosa et al., “Scikit-learn: Machine learning in Python.” Journal of Machine 
Learning Research 12 (2011): 2825-30. 
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data. I also applied more statistical machine learning methods, including neural 
network based methods. 

In the next section I show how I evaluated the performance of this method 
in identifying text segments according to argumentative/non-argumentative 
and claim/premise classes. 


4.2 Evaluation 


In order to evaluate supervised machine learning based methods, a dataset is 
usually divided into two sets. The first is the training set, which the algorithm 
uses to learn patterns from data. The second is the test set, which contains sam- 
ples that the algorithm will not observe until the evaluation phase. Following 
this methodology, the dataset for this research was also divided into training 
and test sets. For this purpose, 13 of the debate transcriptions were set aside as 
the test set and the rest were used in the training phase. 

Several metrics are leveraged to quantitatively evaluate a machine learning 
method. The first is precision, which indicates what percentage of the test data 
is identified correctly over all items that were assigned to this class by the algo- 
rithm. Recall measures what percentage of the items in the test set have been 
correctly labeled with respect to the actual number of that class in the test set. 
In other words, precision describes the “validity” of the results, and recall de- 
scribes the “completeness” of the results with respect to the labels in the anno- 
tated test set. 

The F-score is a combination of precision and recall that is used to quantita- 
tively evaluate the performance of a supervised machine learning algorithm. In 
the following, I report the results based on these metrics, comparing a baseline 
method with our trained SVM method using two different sets and kernels. A 
majority baseline was used as a comparative baseline. 

An improvement in classification results can be observed using the SVM 
classifier, compared to the majority baseline. Considering all the features in the 
feature set also improves the results for both component detection tasks. 

The feature ablation method is a technique used to recognize how different 
features affect the results of a statistical machine learning algorithm. In this 
technique, the algorithm is trained with and without considering one of the fea- 
tures and then the results are compared to evaluate the effect of removing the 
feature. In a feature analysis approach I observed that lexical features (n-grams) 
were the most prominent in the identification and classification of components. 
These results confirmed again the highly context-dependent nature of the task. 
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5 Application 


The main objective of my research is to provide a platform for facilitating the 
analysis of the argument structures of political debates. In support of this pur- 
pose I have developed an argumentative analysis tool called DISPUTOOL.”' DIS- 
PUTOOL provides the functionality to explore debates annotated with claims 
and premises, and to search for argument components surrounding a keyword 
in different debates — it also provides the environment to detect argument com- 
ponents with a new argumentative text input. 

DISPUTOOL also integrates named entities automatically annotated by the 
Stanford CoreNLP tagger and provides the functionality to explore, filter, and 
visualize them. 


5.1 Fallacies 


One of the potential applications of extracting the argument structure of de- 
bates is to detect fallacies. 

Fallacies are types of argument that lack the correct reasoning process. 

By using the argument structures extracted from the proposed method, 
some types of fallacy can potentially be detected - for instance, fallacies which 
occur due to the relevance of the premise provided for a certain claim. Consider 
example K, where the “red herring” fallacy pertains to the relevance of the 
premises provided to the claim which Mondale claims that President Reagan is 
making. 


Example K: Mondale — Reagan, 7 October 1984 

MONDALE: Now, the example that the President cites has nothing to do with abortion. Some- 
body went to a woman and nearly killed her. That’s always been a serious crime and always 
should be a serious crime. 


21 Shohreh Haddadan, Elena Cabrio, and Serena Villata, “DISPUTool-A tool for the Argumen- 
tative Analysis of Political Debates,” (Twenty-Eighth International Joint Conference on Artificial 
Intelligence, Macao, 2019), 6524-6. 
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6 Critical reflection 


This research aims at algorithmic extraction of argument structures from political 
debate data by designing an argument mining pipeline. A dataset of transcrip- 
tions of US presidential debates from 1960 to 2016 was annotated with argument 
components and the relations between them (in this chapter, I have focused only 
on the argument components). 

By applying NLP techniques, I trained a statistical machine learning algo- 
rithm to detect argument components and evaluate the results based on standard 
metrics. 


6.1 Digital source criticism 


It has been discussed that success in a debate depends not only on verbal skills 
but also on non-verbal cues and the visual imagery of a public figure (such as a 
politician). Persuasive techniques that rely on the analysis of text alone can 
eliminate the influence of these non-verbal cues from the overall judgment that 
the audience makes on a speaker’s personality, which also affects the persua- 
siveness of their rhetoric (arguments). 

With regard to the data used in this research, the issue of the effect of vi- 
sual media on the audience’s interpretation of the debate results was most 
clearly highlighted in the analysis of the first televised debates between Nixon 
and Kennedy in 1960, where many audiences only heard the debate on a radio, 
while others watched it on their television sets. Research shows that the audi- 
ence who listened to the debate on the radio mostly favored Nixon, while those 
watching on TV agreed upon Kennedy’s success in the debate. This hypothesis 
was later explicitly investigated in the work of Druckman.” 

One other aspect in the transformation of a dataset into text files is the elimi- 
nation of the verbal cues which exist in sound but not in text, such as putting 
stress on a word in a sentence, using a sarcastic tone to express a claim, etc. 

In Section 4.1, I mentioned yet another aspect of this conversion, which is 
the transformation of verbal dialogues into transcripts since the concept of sen- 
tences, and the boundary between sentences, are vague in oral speech, and the 
appearance of sentence boundaries is due to the mapping of oral speech to text. 


22 James N. Druckman, “The power of television images: The first Kennedy-Nixon debate re- 
visited,” The Journal of Politics 65, 2 (2003): 559-71. 
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The above critical issues should make us vigilant that we do not base the 
analysis of arguments solely on argument structures. 


6.2 Algorithmic criticism 


Inter-annotator agreement is used to measure how reliable an annotated data- 
set is, as mentioned in Section 3.4. I was able to train a statistical machine 
learning algorithm, based on an annotated dataset, with moderate reliability. It 
has been discussed that in subjective tasks we need to make sure that the ma- 
chine learning algorithm is not learning the annotators’ behavior but that it is 
truly learning the task. This issue has been discussed before and there are al- 
ready some solutions for eliminating annotator bias from annotated data for NLP 
applications.” In this research, I relied solely on the comparison of the annota- 
tors’ annotations with the expert annotation and on creating a gold-standard da- 
taset as the human upper limit for the task at hand. 

A recent concern of the artificial intelligence (AI) community has been the 
lack of transparency and explainability of complicated statistical machine 
learning algorithms. We define explainability as the extent to which a human 
can describe the behavior of the algorithm and justify how it concludes its re- 
sults. In recent years, with the emergence of deep learning algorithms, this issue 
has become more severe, particularly in fields where machines are ethically re- 
sponsible, such as with health-care models.“ Although these complicated mod- 
els output more accurate results, they lack explainability and transparency. 
Thus, until the AI research community tackles the problem of explainable AI, a 
trade-off has to be maintained between the explainability and the accuracy of 
such algorithms. In this study, I have therefore tried to add some explainability 
to the algorithm by using the feature ablation method. 


23 Mor Geva, Yoav Goldberg, and Jonathan Berant, “Are We Modeling the Task or the Annota- 
tor? An Investigation of Annotator Bias in Natural Language Understanding Datasets.” (Con- 
ference on Empirical Methods in Natural Language Processing and the 9th International Joint 
Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, 2019) 1161-6. 

24 Rich Caruana et al., “Intelligible models for healthcare: Predicting pneumonia risk and 
hospital 30-day readmission.” (21th ACM SIGKDD international conference on knowledge dis- 
covery and data mining, Sydney, 2015), 1721-30. 
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Ekaterina Kamlovskaya 

Exploring a corpus of Indigenous 
Australian autobiographical works with 
word embedding modeling 

A methodological reflection 


1 Introduction 


Data-intensive research,’ data-rich literary history,” distant reading,? macroa- 
nalysis,* algorithmic criticism,” cultural analytics,° digital literary studies’ — 
these are just some of the names that could describe the field to which my proj- 
ect relates. All these names refer to the use of computational tools and methods 
to investigate research questions from the humanities and, more precisely, to 
analyze literary and historical textual sources. The research pipeline in such in- 
vestigations usually consists of standard steps like research question formula- 
tion, source identification, data collection and analysis, and visualization and 
interpretation of the results. One peculiarity, though, is that researchers often 
borrow tools and methods from a field to which they are not “native”; in most 
cases, we see historians, linguists, or literary scholars applying methods devel- 
oped in computer science. As digital tools transform traditional source types 
(e.g. text) and therefore inevitably change the way we interact with our sources, 
new perspectives may be created, new doors opened. However, phrases like “ap- 
plying computational/digital methods,” “running an algorithm on your dataset,” 
or “running your data through an algorithm” often make the research process 
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seem straightforward and even mechanical, as if it simply involves automating a 
step that would otherwise have taken a researcher significantly more time and 
resources. Yet it is not just about automating and fast-tracking research: the im- 
plications of using digital tools concern the reliability and validity of the study, 
its results and interpretations, and therefore also of the potential contribution we 
are hoping to make. 

In this chapter I reflect on the use of one particular computational method — 
namely, word embedding modeling - to explore a humanities dataset in the 
context of my ongoing doctoral research project. I discuss the suitability of this ap- 
proach for my goals; the decisions and choices I have been making at each stage 
of the research process; the impact of my decisions on the results of a computer- 
assisted study; and the importance of digital source and digital tool criticism. 


2 Summary of the project and research questions 


My PhD study takes as its subject a collection of Indigenous Australian autobio- 
graphical narratives and is an attempt at a distant (or, rather, hybrid) reading 
of the corpus. I examine how the writers (as a collective) represent their experi- 
ences in life writing and how this representation is related to the historical, so- 
cial, and political context within which the works were created. 

The genre of Indigenous Australian life writing emerged around 1950s, 
with the rise of the Aboriginal rights movement. It is considered a literature of 
significant sociopolitical and historical importance,® as the authors share an al- 
ternative history different to the one that had been previously asserted by the 
European settlers, where Indigenous peoples and cultures were either misrepre- 
sented or disregarded.” 

What exactly is said by the Indigenous Australian life writing authors in 
the corpus in relation to the most prominent themes in the genre (for example, 


8 Adam Shoemaker, Black Words, White Page: Aboriginal Literature 1929-1988 (Canberra: 
ANU E Press, 2004), 132. 

9 John Joseph Healy, “ ‘The True Life in Our History’: Aboriginal Literature in Australia,” An- 
tipodes 2, no. 2 (1988): 79-85; Oliver Haag, “From the Margins to the Mainstream: Towards a 
History of Published Indigenous Australian Autobiographies and Biographies,” in Indigenous 
Biography and Autobiography, ed. Peter Read, Frances Peters-Little, and Anna Haebich (Can- 
berra: ANU Press, 2008), 5-28; and Anita Heiss, Dhuuluu-Yala = To Talk Straight: Publishing 
Indigenous Literature (Canberra: Aboriginal Studies Press, 2003). 
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identity, family,'° and land”)? How is the reality represented (and constructed) 
in the corpus? Does the corpus demonstrate any changes in discourse through- 
out the decades of the genre’s existence? These were some of the questions 
guiding my study. 

The project is interdisciplinary and draws insights from such fields as corpus 
and computational linguistics, Australian history, literary studies, Indigenous 
and postcolonial studies, history of concepts,” natural language processing, and 
computer science. As it is a computer-assisted study, transforming research ques- 
tions into formal computational enquiries (operationalizing)? has been a crucial 
step. What do we mean by “themes” or “discourses” and what operations must 
be performed to examine them within the corpus? This question is best answered 
through a discussion of the methodology and the theoretical assumptions be- 
hind it. 


3 Distributional semantics and vector space 
modeling for exploring semantic fields 


Vector space modeling was developed in computer science as a method for in- 
formation retrieval." It was designed to represent textual documents as numer- 
ical vectors based on the frequency of occurrence of individual words in them. 
In word embedding modeling, a more recent development of vector space 
modeling, vectors are used to represent individual words and reflect how they 
are positioned relative to each other in the space of all words from a corpus, 
based on their co-occurrence patterns. Thus, such vectors are believed to reflect 
the words’ semantic and syntactic properties. This method is grounded in distri- 
butional semantics and distributional hypothesis, according to which words 


10 Anne Brewster, Reading Aboriginal Women’s Autobiography (Melbourne: Sydney University 
Press in association with Oxford University Press, 1996), 5. 
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that share similar contexts (i.e. are surrounded by similar words) tend to have 
similar meanings.” 

Word embeddings are used in natural language processing in tasks like 
classification, question answering, and many others. In digital humanities, 
word vector representations are a valuable output in themselves - they are 
often not fed into any further algorithm but rather explored in terms of distance 
between them as a measure of semantic closeness. By exploring the vector 
space of words in a corpus, and the words in proximity to certain target words 
(related to the concepts we are interested in), we can discover the situated 
meaning of these words defined by the way they are used in the corpus. Word 
embedding modeling has been recognized in the digital humanities community 
for facilitating exploration of diachronic meaning shifts, domain-specific lan- 
guage use and discursive spaces." For the purposes of my study I considered 
these discovered sets of words (“nearest neighbors”) as discourses or semantic 
fields” — networks of related words with underlying social and political mean- 
ing, each representing a slice of reality as it is perceived by a specific group of 
people in a defined period of time and reflected in language in use.'® Word 
embedding modeling has demonstrated its potential for highlighting semantic 
fields and discourses in textual data and therefore was the main method I 
chose for my project. 
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4 The “lure of objectivity” — and transparency 
as a way to resist it 


4.1 Does using a computational tool make a study more 
objective? 


It seems to me that the “lure of objectivity” that Rieder and Réhle’? describe as 
one of the challenges faced by digital humanities scholars is one of the reasons 
word embedding modeling has become so attractive. It has been argued that 
traditional humanities approaches (e.g. close reading) are prone to researcher 
bias, which is especially important when dealing with emotionally charged 
topics — and word embedding modeling has been suggested as an effective way 
to make the study more impartial.” I have been exploring a corpus containing 
traumatic memories which are sometimes extremely sad or even shocking to the 
average reader. Using computational technologies in general seems a solution 
for distancing from such texts, allowing an impartial assistant, an algorithm, to 
“run through” the data and mine it for some precious pieces of knowledge with- 
out being affected by the emotions and biases inherent to humans. 

However, a closer look at how word embedding modeling works shows that 
it is not reasonable to view this method — or indeed any computational method — 
as an impartial, unbiased helper. It is important to remember that in addition to 
the biases a researcher inevitably introduces at every stage of the research pro- 
cess — from data collection to modeling and interpretation, through the choices 
they have to make - the computational tool itself is a product of its designer’s 
choices and decisions and, therefore, by its very nature cannot be objective. 

Thus, instead of relying blindly on the tool or arguing that it is objective, or, 
to the contrary, rejecting the tool as not being impartial because of such a 
“flaw”, we should instead admit to and embrace the subjective nature of com- 
puter-assisted humanities research and commit to transparency in our research. 
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4.2 Unboxing the black box tools: Transparency in digital 
humanities research 


Transparency in research concerns making all aspects of the study process 
more visible and strengthening its credibility by, for example, sharing data and 
code, and disclosing the decisions involved in the research process. However, 
there is an extra step that can often be neglected - especially, it seems to me, 
in interdisciplinary studies using methods from a field to which the researcher 
is not “native” — that is, ensuring transparency of the “black boxes,” not just to 
others but first of all to ourselves. Thus, transparency should also concern “our 
ability to understand the method, to see how it works, which assumptions it is 
built on, to reproduce it, and to criticise it.”?! How does the tool I use manipu- 
late the data and change the way I, the researcher, interact with the data and 
draw insights from it? Not only should I aim to understand it myself and criti- 
cally reflect on it, but also to disclose my conclusions to others. 

In a project like mine, one of the ways to make the research process more 
transparent is to publish code, trained models, and corpus metadata. Code doc- 
umentation is a good practice in both the software and science worlds. When 
done well, documentation helps future readers and users of the code under- 
stand what each line does to the data and how this in turn impacts the research 
output. Making the models available lets other researchers explore the data 
and conduct their own experiments. Although not having access to the raw 
data (due to copyright) will be a limitation for them, being able to examine the 
model’s outputs should provide an interesting way to complement or guide a 
close reading. Although I will not be able to share the full texts from the mod- 
eled corpus because the books are copyrighted, corpus metadata will be an im- 
portant window to my data. 

Moreover, when using software solutions (such as, in my case, the Gensim or 
NLTK packages) or tools with a graphical user interface (e.g. Embedding Projec- 
tor), transparency would also mean understanding how they work and disclosing 
this information along with the critical discussion, instead of just presenting im- 
pressive visualizations and hiding methodological decisions. 
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5 Digital tool criticism: Choosing between 
count-based and predictive modeling 


A traditional count-based co-occurrence model is a word-context matrix show- 
ing how often each corpus vocabulary item co-occurs with every other vocabu- 
lary item. Each word is represented as a corresponding row of such a matrix 
through its relationship with all other words. Therefore, each matrix value dem- 
onstrates the strength of association between two words, and words with simi- 
lar co-occurrence patterns will be mapped to similar vectors. 

The more recent approaches to word vector representations are based on 
neural networks inspired by the way our brains work. One widely used algo- 
rithm is word2vec.” In the resulting representation of a word, vector dimen- 
sions, in contrast to count-based models, are not interpretable but are believed 
to capture some aspects of the word meaning.” This type of word embedding 
modeling is called predictive because word vectors are essentially a by-product 
of the algorithm performing a prediction task (predicting context of a keyword, 
or a keyword for a given set of context words, depending on the algorithm’s 
variation). The word2vec algorithm takes a large amount of text as an input 
and, through working on a prediction task, learns vector representations of 
each vocabulary item based on their semantic similarity. 

So how do we select an approach that is suitable for our data and goals? 
Word2vec, like any other machine learning algorithm, requires a large dataset 
to learn representations accurately. It has been noted that small dataset sizes 
can affect the accuracy and reliability of modelling” and that, for such corpora, 
co-occurrence matrices could be a better solution. Furthermore, a comparative 
study requires models to be trained on subcorpora (for example, based on the 
publication date for a diachronic investigation), but each subcorpus may con- 
tain too few examples for the algorithm to learn reliable word representations. 
In addition, word2vec modeling has been criticized for the inherent randomness 
involved in its generation of word vectors, which affects the reproducibility of 
studies.” This property stems from random vector initialization at the beginning of 
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each experiment run, and the order in which the examples are processed. Lastly, if 
a researcher chooses to apply subsampling of frequent words, which the algorithm 
allows, this probabilistic procedure will introduce even more randomness and 
thus contribute to the reliability issue. 

My corpus, like so many other digital humanities datasets, is relatively 
small. So, to avoid the issues related to data size described above, I could have 
used count-based methods instead of a neural network-based one. Indeed, it 
has been argued that if the corpus is relatively homogeneous, with texts be- 
longing to a narrow domain and one genre, the number of words required to 
build a reliable model may be smaller, as such texts may offer more consistent 
contexts.*° The randomness problem is just as important — and count-based 
modeling is a definite winner here. However, for a more comprehensive picture, 
using both types of model and comparing (or even consolidating) their outputs 
could be a promising scenario. 

I began with using Gensim‘s word2vec implementation” in Python - this 
was not without its challenges. In 2018, Gensim’s creators ran a user survey 
and learned that their documentation was considered lacking. This was indeed 
an issue that I had encountered previously. However, the situation seems to 
have improved since then: there are now helpful tutorials available and the 
code is better documented. But while I could have used a tutorial without fully 
understanding what it does to the data, this would have diminished the trans- 
parency of my project and therefore its reliability. I had committed to learning 
more about the algorithm, machine learning, and neural networks in general. 

I had to make multiple decisions when applying the algorithm to model my 
data - for example, choosing the algorithm architecture, vector size, number of 
words in the context window, and minimum count parameter (the algorithm 
would ignore words with total frequencies lower than this number), to name 
just a few. It has been argued that there is no optimal combination of parame- 
ters and that the choice of parameters is generally based on the researchers’ 
experience in training such models, as well as on the research questions and 
the nature of the data.” 

How do such decisions impact the research process, its outcomes, and inter- 
pretation? One example: in both predictive and count-based models, the size of 
the context window and its type (symmetrical/asymmetrical) must, as mentioned 
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earlier, be pre-defined by the researcher. The impact of this decision on the 
learned vectors is quite significant: it has been suggested that larger context win- 
dows tend to provide more semantic information, while smaller ones provide 
more syntactic context. This and many other algorithm parameters are “built-in” 
and, while a user (a researcher) can opt out of defining some of them, many 
must still be set, and such choices should be justified and disclosed. 

To conclude, we should not refrain from using tools from a new and often 
unfamiliar field, but should remember that using them without learning the 
fundamentals of how they work, how they manipulate the data and change our 
perspectives on the data, may lead to misunderstanding the study’s potential 
and reducing its reliability. 


6 Corpus design: Digital source 
(and tool) criticism 


My research questions and the envisioned computational approach required 
creating a corpus: a digital collection of texts meeting certain criteria. In this 
section, I reflect on the nature of the data from the perspective of digital source 
criticism and digital tool criticism. Data is not, in fact, “data” but rather, as 

Drucker puts it, “capta” - it is not “a ‘given’ able to be recorded and ob- 
served” but rather is “‘taken’ actively.””? It is important to understand and be 
transparent about the constructed and selective nature of the corpus used for 
modeling and subsequent analysis. While the computational tools themselves 
introduce subjectivity, the processes of data collection and remediation (digiti- 
zation) cannot be seen as objective and impartial either. 


6.1 Creating a bibliography: Search and critical evaluation 


At virtually every stage of a digital humanities (DH) project, data undergoes cer- 
tain reductions,*° - a fact that has been one of the main criticisms of DH as a 
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field.” The very first reduction happens at the stage of selecting texts to include 
in the bibliography. 

At the beginning of my study there was no existing corpus of Indigenous 
Australian autobiographical works. Therefore, the first step was to create one — 
and before that to compile a bibliography of all published works that met cer- 
tain selection criteria. 

One of the challenges in creating a full bibliography of Indigenous Austra- 
lian autobiographies concerns the definitions and resulting classifications.” 
Who should be considered an Indigenous Australian author? What about co- 
authored works? “As-told-to” works? What indeed is an autobiography? 

There were two existing bibliographies: Horton’s (1988) “non-exhaustive” 
list of Indigenous Australian literature that had been published between 1924 
and 1987, including 21 works classified by him as life writing,” and Haag’s 
(2011) bibliography of 177 autobiographies published between 1950 and 2004.°* 
As for the more recently published works, to the best of my knowledge, there 
was no bibliography that listed them. 

There is a noticeable difference between the existing classifications of In- 
digenous Australian writing in the academic literature: for example, Brewster 
in her article “Aboriginal life writing and globalisation” discusses the book 
Follow the Rabbit-Proof Fence by Doris Pilkington,” which she describes as 
“biographical story” and “documentary life writing.” However, Haag does not 
include this work in his bibliography. He also excludes I, the Aboriginal (the 
autobiography of Phillip Roberts, an Indigenous Australian, written based on 
multiple interviews by Douglas Lockwood),*” arguing that whether to consider 
it an autobiography or not “is a matter of perspective.””® Pilling, on the other 
hand, calls it an autobiography in his book review (although he adds that it 
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was “edited and re-written somewhat by Lockwood” — Lockwood being a Euro- 
pean anthropologist).?” At the same time, Horton notes that the book is “a biog- 
raphy of Roberts, however, the author, Lockwood, has chosen to make him the 
implied author of the text which results in a text that is not completely authen- 
tic. The language used by the implied author far exceeds the ability of any 
“Noble Savage“ created by Lockwood.”° This case demonstrates the subjectiv- 
ity of the data I was modeling, as I had to consider the often controversial defi- 
nitions and classifications conceived by others, including those based on rather 
discriminatory assumptions. 

With these two lists as a starting point, I started searching the Internet for 
other works. The Google search engine played a crucial role in this process of 
bibliography creation. Its decision-making, though similar to any other com- 
puter system, is based on rules and criteria defined by human designers who 
decide which resources will be shown to me first - and last. For example, web- 
pages with better usability and accessibility for various types of browsers and 
devices will be ranked higher; and my location will be taken into account (un- 
less I switch this function off) when displaying search results the system con- 
siders more relevant for the query.“ 

Moreover, I have to critically evaluate any information I find about the genre 
and the works supposedly belonging to it. Is the information authoritative, genu- 
ine? Who created this list? What classifications and definitions is it based on? If 
one uses a query like “Indigenous Australian autobiography” for a Google search 
today, among the top search results it is likely to return a link to the two-part 
Goodreads list that I myself have been compiling over the last three years.”? At 
the current stage of development this list is not well documented, but to ensure 
its transparency and allow users to make decisions on its trustworthiness I am 
planning to add more information about the choices I have made to create it 
(e.g. the definitions of “an Indigenous Australian author“ or “autobiography“ 
that I have used). 

Lastly, there is one case that demonstrates the perils of online search and 
the importance of digital source criticism — but also the complexities of the In- 
digenous Australian literary scene. Here is what was written on the back cover 
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of the 1994 (first) edition of My Own Sweet Time: “This is a lively, gutsy story of 
an urban Aboriginal girl making it in the tough city counter culture of the mid- 
sixties.”“? The author herself is described in the book as follows: “Wanda Kool- 
matrie was born in the far north of South Australia in 1949. Removed from her 
Pitjantjara mother in 1950, she was raised by foster parents in the western sub- 
urbs of Adelaide, where she went to school, leaving in 1966 and moving to the 
eastern states. |. ..] She is currently living in London UK and among other 
things working on her next novel.”“* 

Fast-forward 12 years, and here is the book’s second (2006) edition and the 
corrected information about the author: “Leon Carmen was born and educated 
in South Australia. Wearied by a string of menial jobs, such as cabbie, musi- 
cian, et cetera, he turned to story telling. As Wanda Koolmatrie, he wrote ‘My 
Own Sweet Time’, which won the $5,000 Dobbie Award, the prize later being 
recalled when the author drew attention to Wanda’s fictional status. Mr. Car- 
men now lives in Ireland.””” 

In 1997, the book had been discovered to be a hoax, a fiction written by a 
white male taxi driver rather than an autobiography of an Indigenous Austra- 
lian woman.“° However, it had already won an award for a first novel by a fe- 
male writer and been included in numerous Indigenous Australian studies 
reading lists. 

If I had come across the 1994 first edition without having access to any sup- 
plementary information about the book and the hoax, I could have been misled 
and included this first edition in my bibliography and the publicly available 
Goodreads list, with the image of the “About the author” page, thus unwittingly 
misinforming whoever decided to rely on my list. This example shows how im- 
portant it is to critically reflect on the reliability of the digital — especially on- 
line - sources we plan to use for our research. 

As a result of merging and editing the two bibliographies described earlier, 
and supplementing them with the works I found during my online search, I 
constructed a bibliography of 289 entries (where I considered short stories in- 
cluded in a book or published as part of an online project, as well as full-size 
literary works, as separate entries) spanning the period between the 1950s and 
2020. 
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6.2 From bibliography to corpus as a model 


Creating a corpus from a bibliography can be seen as modeling — mapping from 
the original (for example, language used by a particular social group or, as in 
my study, a genre) to a representation which we believe reflects the qualities of 
the original. However, to be truly representative and thus allow for generaliza- 
tion (that is, using the corpus as a proxy for the whole universe of Indigenous 
Australian life writing), a sample must be random, which was not the case in 
my project. Therefore, I set a goal not to provide a generalizable outcome but 
rather to investigate the use of language in this particular corpus which, I be- 
lieve, is suitable for the task. 


6.3 Digitization and born-digital materials 


Digitizing was the next data reduction stage. The books in my corpus were 
scanned with a Treventus ScanRobot automatic book scanner at the University 
of Luxembourg’s DH Lab, and I was personally involved in the digitization pro- 
cess, thus learning about the scanning and post-processing technology (includ- 
ing skew correction, rotation, and cropping of page images) and gaining a good 
understanding of how remediation may transform the data. The output PDF 
files were then processed with ABBYY FineReader optical character recognition 
software and converted into text files. In addition to the digitized data, I also 
included in my corpus the born-digital short autobiographical stories published 
as part of the University of Queensland’s “Growing up Indigenous in Australia” 
project.*” 


6.4 Preprocessing 


Before proceeding to modeling, the corpus had to be preprocessed to make 
modeling more computationally efficient. Reducing vocabulary size when using 
word embedding modeling is a double-edged sword: on the one hand, it should 
help create a more accurate model; on the other hand, reducing the size of a cor- 
pus that is already quite small may impact the model quality. 
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6.4.1 Removing irrelevant text sections 


My first manipulation of the data from the raw text files was removing material 
written by people other than the Indigenous Australian authors (image cap- 
tions, introductions, and acknowledgments), tables of contents, text on the 
back cover, and other textual elements outside of the autobiographical portion 
of the book. Each of these transformations would impact the resulting data to 
be modeled, and I had to consider every decision carefully, including such deli- 
cate cases as books with large portions of text in an Indigenous Australian lan- 
guage, or co-authored books in the form of questions and answers, or those in 
the form of a mixture of scientific and testimonial writings. 


6.4.2 Tokenizing 


The next step was splitting the texts into smaller chunks (tokenizing): to pre- 
pare input for word embedding modeling the corpus was turned into a list of 
sentences, 

with each sentence represented by a list of tokens. Tokens may include not 
only words but also numbers and punctuation marks. 


6.4.3 Stopwords, numbers, and punctuation 


There are words that are very common and seem to be of relatively low value 
for text analysis — these are referred to as stopwords.“ Removing such words 
helps reduce noise in the data and as a consequence makes the model more 
memory-efficient and accurate. 

I used the stopword list from the NLTK package,“” which includes words like 
“I,” “me,” “my,” “by,” “for,” “some,” “other,” and “haven’t,” among others. 
However, the list is generic and does not take into account specific aspects of the 
domain under study. Another option would have been to create a custom list 
based on my corpus, where the discriminative power of words could have 
been measured more precisely. Alternatively, I could have used the “noun-only” 
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approach (filtering based on the part of speech) — but then there was a chance of 
missing some aspects of the texts presented in other parts of speech. 

While removing non-alphabetic symbols (e.g. punctuation and numbers) is 
a common preprocessing step, for some tasks it may be disadvantageous or 
even harmful: thus, for example, stylometry and author identification research 
may require leaving punctuation (and pronouns) in the corpus, whereas for my 
project punctuation was removed. 


6.4.4 Lowercasing 


Lowercasing is another common preprocessing step in natural language proc- 
essing (NLP) that is helpful in many use cases - for example, in information 
retrieval applications, where it helps the search engine to find, say, Apple ap- 
plications even if a user does not capitalize the word “apple.” 

If I did not lowercase my corpus then the model would treat capitalized 
words at the beginning of sentences differently from the same words occurring 
elsewhere (and not capitalized), which would negatively impact accuracy. At 
the same time, “Liberal” (“a member of a liberal party in politics, especially of 
the Liberal party in Great Britain”) and “liberal” (“favorable to progress or re- 
form, as in political or religious affairs”),*° for example, would be treated as the 
same word. Moreover, when applying a neural network method like word2vec, 
one must remember that the learned word representations will be greatly af- 
fected by the number of occurrences presented to the algorithm, and that 
lowercasing has a certain impact on this number for each vocabulary item. 


7 “The power of visual evidence” and a brief 
discussion of initial experimental results 
In digital humanities, images are used not only as a communication tool but, 


and perhaps more importantly, as an analytical tool allowing the output of al- 
gorithms to be investigated more thoroughly. First of all, following Rieder and 
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Röhle’s advice,” I want to note that the visualizations I present in this section 
showcase an interim result of the ongoing and iterative research process. Visu- 
alizing word embeddings is a challenging task primarily because of the high 
dimensionality of the learned word vector space. 

To incorporate my first experimental results in this discussion of visualiza- 
tion, I will use an example from my study and explore the discourse of sport by 
drawing on the paper by Osmond investigating the discussions of sport in In- 
digenous Australian autobiographies.” Osmond emphasizes the importance of 
sport for Indigenous Australian communities and with his study aims to “re- 
read” memoirs to explore how sport is discussed in life writing. I expected that 
word embedding modeling could help do exactly that - explore “what is said 
and why” about particular concepts and topics. Osmond argues that life writing 
can highlight the subjective meaning of sport as represented by language in 
use. While he focuses on a specific geographic location — four Indigenous Aus- 
tralian communities with which he works - I attempted to investigate the lan- 
guage use related to sport in the whole corpus I had created, which can be seen 
as an extension to his study. 

The most straightforward and accessible way to present and analyze results 
from word embedding modeling is to generate a list of a user-defined number 
of the words positioned nearest (based on their presumed semantic similarity) 
to a certain keyword. 

How many neighboring vectors should be considered as the most important 
for analysis? What if I decide to only look at the top 20 but number 21 is more 
insightful in the context of the study? Often, cosine distances between neigh- 
boring words and the keyword are very similar and a researcher has to decide 
which words to include in the analysis (for example, by setting a cut-off thresh- 
old). It is easy to see how even decisions made at the visualization stage can 
impact the interpretation of the results. 

Analysis of the names of the sports disciplines in the list of nearest neigh- 
bors shows that the top results include “softball,” “athletics,” “rugby,” “ 
cer,” “tennis,” “hockey,” and “netball.” This supports the claim by Osmond 
that, according to the analysis he had conducted on his corpus, “all works re- 
ferring to sport focus primarily on introduced sports rather than traditional 
sporting, physical, or recreational activities,” which is explained by “the early 
imposition of Western cultures and the suppression of traditional pursuits.” ” 


SOC- 


51 Rieder and Röhle, “Digital Methods.” 

52 Gary Osmond, “Playing the Third Quarter: Sport, Memory and Silences in Aboriginal Mem- 
oirs,” Australian Aboriginal Studies 2 (2019): 73-88. 

53 Osmond, “Playing the Third Quarter,” 79. 
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However, one of the top ten words appears to be “didge” — short for “didgeri- 
doo,” a traditional Indigenous Australian musical instrument used for ceremo- 
nies or recreation. This can be interpreted as a continuing role of Indigenous 
Australian traditions, but of course close reading would have helped under- 
stand the context better. In addition, words such as “prowess,” “elite,” “ex- 
celled,” and “career” may signify what Osmond describes as “the link between 
sport and self-esteem,” as playing sports served as a confidence boost for Indig- 
enous Australian people, as a “ticket out,” and a tool for building community. 
In general, the terms associated with sport seem to be neutral or positive. 

To visualize the vector space, the number of its dimensions must be re- 
duced to two or three for it to be comprehensible by humans. Tensorflow’s 
Embedding Projector is one popular tool allowing visualization of word2vec 
output.“ 

First, the tool’s usability is worth commenting on. Embedding Projector re- 
quires a user to upload two separate files: one with vectors and one with corre- 
sponding labels (tokens). However, Gensim’s word2vec outputs only one file 
with the model and so some additional steps on the part of the researcher are 
required to extract the two files. Further, Embedding Projector is not very well 
documented and would have benefited from additional online tutorials and 
case studies on topics related to digital humanities to facilitate its use. More- 
over, in my opinion, it is another example of the “lure of objectivity” and may 
be misleading for humanities scholars because the visualization is not based on 
the original data the researcher uploads. 

To transform the multidimensional vectors into 2- or 3-dimensional ones for 
further visualization, Embedding Projector applies one of the dimensionality re- 
duction algorithms (UMAP, t-SNE, or PCA). Selecting one of these is another deci- 
sion to be made and justified by the researcher, who should understand how 
choosing to use the tool and a certain dimensionality reduction method may im- 
pact the results interpretations. For example, while PCA does not try to preserve 
all distances between the vectors but does aim to maximize the variance of the 
information encoded in the few dimensions after the transformation, t-SNE tries 
to preserve distances but is stochastic and therefore can produce different results 
for every run, even with the same data and parameters.” To avoid this random- 
ness, instead of using the Projector visualizations it is possible to build a t-SNE 


54 Daniel Smilkov et al., “Embedding Projector: Interactive Visualization and Interpretation 
of Embeddings” (paper presented at 30th Annual Conference on Neural Information Process- 
ing Systems, Barcelona, November 2016). 

55 Chris Culy, “Word Vectors with Small Corpora: Visualizing Word Vectors,” accessed June 17, 
2021, https://www.chrisculy.net/lx/wordvectors/wvecs_visualization.html. 
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visualization using the scikit-learn machine learning library for Python.”® The ad- 
vantage of using this library is that there is the possibility to select a value for the 
“random state” parameter to get the same visualization at each algorithm run. 

Embedding Projector allows users to build a customized projection based 
on specific keywords used as axes to find how words are located in the space in 
relation to these defined axes and explore if this relationship is meaningful. 
Thus, Fig. 1 shows how “softball” seems to be located more to the left on the 
“woman-man” (left-right) axis than “footy,” or “football,” or “basketball.” 
This can be seen as supporting the fact that softball is traditionally considered 
a female sport. 
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Fig. 1: Projection of the word “softball” on the “woman-man” (left-right) axis. 2020. 
© Ekaterina Kamlovskaya. 


56 Fabian Pedregosa et al., “Scikit-learn: Machine Learning in Python,” Journal of Machine 
Learning Research 12 (2011): 2825-30. 
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To sum up, my first experimental results suggested that word embedding 
modeling is a promising method for a corpus investigation in the humanities but 
needs to be used with caution, after careful consideration of numerous factors 
that may influence the algorithm output and model interpretation, and hence 
close reading is recommended to support the analysis. 


8 Conclusion 


Using computational methods and tools for an exploration of a usually rela- 
tively small and domain-specific humanities corpus is often a difficult task due 
to the limitations imposed by these tools and methods. However, instead of re- 
jecting a computational approach altogether it is worth investigating the oppor- 
tunities this may offer while ensuring transparency of the project methodology 
and making oneself aware of the implications of the used methods for the re- 
sults and interpretations. In this chapter, I have reflected on the challenges I 
am encountering in my corpus-based study of the genre of Indigenous Austra- 
lian autobiography, from the corpus construction stage, through modeling, to 
interpretation of the algorithm outputs and visualizations. Digital source and 
tool criticism are important for ensuring the transparency and reliability of a 
study, and understanding and documenting the inner workings and decisions 
to be made while using tools and methods borrowed from a different field are 
challenging but extremely important aspects of an interdisciplinary digital hu- 
manities project. 
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Thomas Durlacher 

Philosophical perspectives on 
computational research methods 
in digital history 


The cases of topic modeling and network analysis 


This chapter has three main objectives: firstly, to discuss several philosophical 
positions regarding research methods; secondly, to outline certain features of sci- 
entific methods; and, thirdly, to use this terminology to look at digital history. 

In a brief sketch on several historically important philosophical positions 
concerning research methods I first aim to show that the search for the one “cor- 
rect” scientific method has recently given way to a more pluralistic conception of 
research practices. 

Next, I outline some of the general features of scientific methods - not asa 
comprehensive description of research methods, but rather as an attempt to 
shed light on the often neglected point that methods are closely related to the 
academic goals we are working toward. Although these goals may be uncertain 
or changing, critical reflection on the connection between methods and what 
we are trying to achieve in our research has the potential to increase our aware- 
ness of the limitations and possibilities of certain methods. 

Lastly, I use this philosophical terminology to look at digital history - a 
comparatively new historical subdiscipline that is distinguished by its compu- 
tational methods — and discuss two different digital methods. My PhD project is 
concerned with the investigation of a specific methodological practice, compu- 
tational modeling, on which there are still ongoing debates as to what the feasi- 
ble goals for this method could be. The following methodological reflections 
are part of my ongoing investigations into the nature of research methods. 
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1 Historical perspectives on research methods 
and philosophy 


Before the establishment of independent philosophical subdisciplines associ- 
ated with individual scientific disciplines, philosophical investigations into the 
nature of research methods often coincided with the task of explaining human 
reasoning. Although these investigative attempts aimed for generality, they 
also emphasized the need to provide concrete instructions on what the scien- 
tific method should look like. Examples of this approach toward methods are 
well known. René Descartes suggested that knowledge proceeds from first prin- 
ciples known a priori and with certainty, while Francis Bacon claimed that we 
gain knowledge of the world by collecting observable evidence and then extend 
this knowledge by generalization.* 

During the course of the twentieth century this traditional philosophical 
view of research methods changed dramatically. The development can most 
easily be summarized under the label of diversification, which describes the 
process from the search for the one “correct” scientific method toward a more 
pluralistic conception of academic research in general. In this pluralistic land- 
scape, general approaches and specific studies of individual elements of re- 
search can be seen as complementary, rather than in conflict with each other. 

In the first half of the twentieth century, general approaches to methods 
and objectives in the humanities were less widespread than in the sciences but 
still common. In contrast to the philosophical discussion about natural science, 
which focused on the logical relationship between theories and evidence, the 
debate in the humanities focused on the question of how the human dimension 
of the research object requires a specific method and thus distinguishes natural 
science from the humanities.’ Participants in these discussions emphasized 
that the study of human experiences depends on a specific form of understand- 
ing conceived as a distinct kind of hermeneutic practice, which is not adaptable 
to the natural sciences. 


1 René Descartes, “Rules for the Direction of the Natural Intelligence,” in Descartes: Selected 
Philosophical Writings, ed. John Cottingham, Robert Stoothoff, and Dugald Murdoch (Cam- 
bridge: Cambridge University Press, [1701] 1999), 1-19. 

2 Francis Bacon, The New Organon, ed. Lisa Jardine and Michael Silverthorne (Cambridge: 
Cambridge University Press, [1620] 2000), 33. 

3 See for example Wilhelm Dilthey, Introduction to the Human Sciences, ed. Rudolf Makkreel 
and Frithjof Rodi, Selected Works I (Princeton: Princeton University Press, 1989); or Max 
Weber, “Objectivity in Social Science and Social Policy,” in The Methodology of Social Sciences, 
ed. Henry Finch and Edward Shils (New York: Free Press, 1949), 50-112. 
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It is sometimes assumed that in the humanities the systematic discussion 
of methods is less widespread than in the sciences. That this is not the case can 
be seen by the recent work of Rens Bod.“ In his innovative account of the histor- 
ical development of the humanities, he carefully outlined the importance of 
methodological principles within the humanities.° 

The philosophy of science has also moved away from the grandiose old 
philosophical systems toward more specific questions concerning scientific 
work, such as, What should be considered as evidence and how can it be re- 
lated to theories? A general and still fairly influential proposal in this regard 
was the hypothetico-deductive approach of Carl Hempel.’ Put in simple terms, 
this approach considered the scientific method as consisting in the suggestion 
of a hypothesis, the derivation of consequences from this hypothesis, and the 
testing of whether those consequences can be observed. For Hempel, this ap- 
proach provided a general procedure to get us closer to the conceived goal of 
science, i.e. the formulation of laws of nature. 

Although philosophical accounts of research methods such as the hypo- 
thetico-deductive approach provide useful insights into the logic of research, it 
is clear that this kind of philosophical theorizing started with an already com- 
paratively abstract picture of the objects of research and how they should be 
investigated. These accounts rarely reached the level of the working researcher 
and the more mundane problems of their work. The second half of the twenti- 
eth century saw considerably more attention being paid to the local circumstan- 
ces of knowledge production. The watershed moment in this process toward 
more attention being paid to local research practices was the publication and 
reception of Thomas Kuhn’s monograph The Structure of Scientific Revolutions. 


4 Rens Bod, A New History of the Humanities: The Search for Principles and Patterns from An- 
tiquity to the Present (Oxford: Oxford University Press, 2013), 364. 

5 For a general examination of the role methods play in the sciences see Robert Nola and Ho- 
ward Sankey, Theories of Scientific Method: An Introduction (Stocksfield: Acumen, 2007); Hugh 
Gauch, Scientific Method in Brief (New York: Cambridge University Press, 2012). For an over- 
view of the role that methods play in the humanities and history see Simon Gunn and Lucy 
Faire, eds., Research Methods for History (Edinburgh: Edinburgh University Press, 2012); and 
James E Dobson, Critical Digital Humanities: The Search for a Methodology (Urbana: University 
of Illinois Press, 2019). 

6 In this context, the discussion revolved around deductive, inductive, and abductive reason- 
ing. See Nancy Cartwright, Stathis Psillos, and Hasok Chang, “Theories of Scientific Method: 
Models for the Physico-Mathematical Sciences,” in The Cambridge History of Science: The Mod- 
ern Physical and Mathematical Sciences, ed. Mary Jo Nye, 5 (Cambridge: Cambridge University 
Press, 2003), 21-35. 

7 Carl G. Hempel, “Studies in the Logic of Confirmation,” Mind 54, no. 213 (1945): 1-26. 
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According to the Kuhnian picture of science, methods are embedded in histori- 
cally changing paradigms.® Kuhn questioned traditional distinctions between 
normative and descriptive approaches toward research methods and argued 
that the rules for their application and evaluation depend on the larger context 
of a paradigm. Philosophers like Paul Feyerabend further undermined the dis- 
tinction between normative and descriptive approaches toward research meth- 
ods by claiming that there are no genuine normative methodological principles 
at all.? In the aftermath of the Kuhnian revolution, the study of science took a 
variety of different forms, ranging from historical studies focusing on the episte- 
mological principles behind methods and the sociological context of research, to 
a more general practice-oriented approach.’ These approaches found that re- 
search methods can have a wide variety of context-dependent functions, mirroring 
the heterogeneity of the different disciplines themselves. 

After several decades of intense intellectual exchanges neither the older, 
more general, approaches nor the newer contextual approaches toward scien- 
tific research methods have prevailed. Currently, the status quo in the philoso- 
phy of science is characterized by the comparatively peaceful coexistence of 
the different approaches. In Section 2 I outline one central feature of research 
methods - their goal-directedness — which is especially important to under- 
standing how such methods can be evaluated. 


2 Methods and goals 


Methods, in the sciences as well as the humanities, are means to attain the 
goals of individual disciplines such as history, biology, or physics. “Means” 
here primarily designate a set of activities a researcher can engage in. These 
activities range from what goes on in one’s mind while doing research (reason- 
ing, thinking, imagining, inferring) to actions that involve interaction with our 


8 Thomas S. Kuhn, The Structure of Scientific Revolutions, 4th ed. (Chicago: The University of 
Chicago Press, 2012), 8. 

9 Paul Feyerabend, Against Method, 3rd ed. (London: Verso, 1993), 14. 

10 See Bruno Latour and Steve Woolgar, Laboratory Life: the Construction of Scientific Facts 
(Princeton: Princeton University Press, 1979), 21-42; or Andrew Pickering, Constructing Quarks: 
A Sociological History of Particle Physics (Chicago: University of Chicago Press, 1984), x. 

11 See Ian Hacking, The Taming of Chance (Cambridge: Cambridge University Press, 1990), 
1-10; as well as Philip Kitcher, In Mendel’s Mirror: Philosophical Reflections on Biology (Oxford: 
Oxford University Press, 2003), xi. 
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environment (observing, measuring, gathering data, reading, conducting inter- 
views, writing, collecting specimens, performing experiments). 

Methods provide a focal point for a discipline’s self-identity. Traditionally, 
students are introduced to a research field by learning to master the most im- 
portant methods of that field. This process leads the novice from the laborious 
study of procedures, principles, and rules, to full immersion in a discipline. 
Students thus acquire the ability to apply these methods, without assistance, in 
order to answer new research questions. Research questions often include the 
formulation of certain goals and in most cases also specify the methods to be 
used to reach those goals. 

In practice, it is often the case that the proposed goals and methods of a 
research project change over time, in an iterative process, but this does not im- 
pair the close relation between methods and goals. 

The goals of methods can encompass general objectives such as knowl- 
edge, prediction, control, explanation, and understanding, as well as domain- 
specific, lower-level objectives such as the accurate description of a historical 
event, the explanation of a physical phenomenon, the classification of biologi- 
cal species, or the collection of evidence. It is important to notice that when we 
talk about methods being goal-directed we use an ellipsis to express the fact 
that methods used by researchers are used to achieve certain goals. Therefore, 
it is not a method in itself that achieves a goal but the researcher implementing 
the method who achieves the goal. 

Usually, it is assumed that the achievement of these goals is not the result 
of arbitrary luck, but rather the outcome of the systematic work of a community 
of researchers who think about and critically evaluate their methods. This al- 
ready reveals one central point about methods. The evaluation of a method de- 
pends crucially on the goals we have. A method is not intrinsically good, bad, 
or adequate but is good, bad, or adequate in relation to a specified goal the 
method is directed toward, as well as in relation to the goals of the discipline. 
Goals in this sense are determined by individual researchers and the scientific 
community. Sometimes higher-level goals and lower-level goals conflict with 
each other or are not coordinated appropriately to further the progress of a dis- 
cipline. A lower-level objective can be perceived as undesirable by some re- 
searchers because they are not aware of how it contributes to higher-level 
goals. On the other hand, it is also possible to criticize a method when it is not 
clear how the method contributes to the overall goals of the discipline. 

A method can be said to be adequate if it helps us to achieve a certain ob- 
jective. Wendy Parker defined adequacy to achieve a purpose, with the help of 
a tool, in the following way: 
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ADEQUACYC: A tool M is ADEQUATEC-FOR-P if and only if, in C-type instances of use 
of M, purpose P is very likely to be achieved.” 


We can reformulate this conception of adequacy for methods in general as: 


ADEQUACYM: A method M is ADEQUATEM-FOR-G if and only if, in C-type instances of 
use of M, goal G is very likely to be achieved. 


The notion of ADEQUACYM helps us describe methods as a reliable way to reach 
a certain goal. In this sense, methods are fallible and depend on the presence of 
the right circumstantial factors. C-type instances designate the context in which 
a method is used. The method to measure temperature, for example, consists of 
the use of a thermometer in a certain unobstructed context. In this case, the goal 
is the representation of temperature. The establishment of the adequacy of a 
method is possible through one of two ways. Either the method has been success- 
ful in the past or we understand the underlying processes of the method well 
enough to be confident in its efficiency before actually testing it. 

It is not always easy to say what the objectives of a method are. The objec- 
tive cannot be a specific result. It rather has to be something like a range of 
possible outcomes informing us about the object under investigation. What I 
mean by a range of possible outcomes is that a tool, like a thermometer, or a 
procedure, like the measurement of temperature, is not used to depict a single 
temperature point but rather represents the temperature of the object it is ap- 
plied to at the time of the measurement. 

Consider a situation described by the historian and philosopher of science 
Hasok Chang.” In the early days of the history of thermometers, scientists had 
no way to judge the correctness of those instruments, except by comparing 
them with each other. It proved especially difficult to establish fixed tempera- 
ture points (which in turn were needed to create quantitative scales), like the 
boiling point of water, in situations where there were no independent methods 
of temperature measurement available. This problem was particularly hard to 
solve because it was not known if certain physical phenomena, such as the 
boiling point of water, appear at a fixed temperature point at all. The problem 
persists even if we account for the exclusion of distorting factors like impurities 
in the water, atmospheric pressure, and so on. Here, the aim of the instrument — 
to measure temperature — was itself such an obscure notion that it was difficult 


12 Wendy S. Parker, “Model Evaluation: An Adequacy-for-Purpose View,” Philosophy of Science 
87, no. 3 (2020): 461. 

13 Hasok Chang, Inventing Temperature: Measurement and Scientific Progress, Oxford Studies 
in Philosophy of Science (Oxford: Oxford University Press, 2007), 57-102. 
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to assess the reliability of the methods used. In the end, a variety of different 
measurement methods (one of which was the experienced body temperature) 
were used to correct each other. This turned out to be useful for studying the 
phenomenon, as well as for improving the methods over time. 

In general, research methods can assume two different roles.“ The distinc- 
tion between these roles shares similarities with the distinction between the 
context of discovery and the context of justification in the philosophy of science. 
In the first role, methods can encompass activities that have an auxiliary func- 
tion in the research process. 

Procedures used to acquire funding, determine how to get to conferences, or 
decide how to organize teaching activities are practical research-related activities. 
Such methods, although important to the research enterprise and probably sys- 
tematic, do not play a role in the way in which we justify our knowledge claims, 
and they therefore belong to the context of discovery. This context also includes 
sources of inspiration outside the realm of rational justification, such as dreams, 
spiritual inspirations, and subjective preferences. In their second role, methods 
can also support the results of research in an epistemic manner. An example here 
is the use of comparative script analysis to date an inscription. In this case, the 
procedure we use to determine the date of a manuscript - i.e. the comparison of 
different texts — provides a reason for us to believe that the inscription has a cer- 
tain age, and thus belongs to the context of justification. Faulty procedures under- 
mine knowledge claims if, for example, the corpus of texts is incomplete. Proper 
procedures, in contrast, strengthen knowledge claims. In the following, I will be 
primarily concerned with methods in this narrower, epistemic sense. Since the 
second half of the twentieth century, the distinction between the context of dis- 
covery and the context of justification has been a point of contention. There is a 
sense that, even in the discovery process, epistemic considerations play a role — 
while, in actual research, what is claimed to be done or believed for epistemic 
reasons is sometimes influenced or distorted by external factors: non-epistemic 
factors. For conceptual clarification, it is nonetheless useful to distinguish be- 
tween these two roles that methods play, even if the distinction cannot always be 
sharply drawn.” In history, the epistemic function of methods is generally ac- 
cepted. In this respect Jörn Rüsen writes: 


14 Nola and Sankey, Theories of Scientific Method, 18-9. 

15 This distinction was originally popularized by Hans Reichenbach. For a more recent discus- 
sion see Jutta Schickore and Friedrich Steinle, Revisiting Discovery and Justification: Historical 
and Philosophical Perspectives on the Context Distinction (Dordrecht: Springer, 2006), vii-xix. 
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Why method? It is a matter of acquiring historical knowledge from the empirical facts that 
are left from the past and thereby, in general, accessible in the present (so to speak, in 
front of your eyes). The methodological procedures of this acquisition serve to strengthen 
this knowledge and to systematically justify its plausibility or validity. Methods make 
knowledge justifiable by verifiability of its statements.'© (Translation my own.) 


The application of methods also distinguishes research as a systematic enter- 
prise. Doing research is having a plan - it embodies some kind of order, and is 
not arbitrary. Even when this order is intentionally given up, as in the case of 
exploratory or speculative research, it should be clearly distinguished from 
method-based research. This systematic approach also contributes to the prog- 
ress of academic research, given that the progress of a discipline does not only 
depend on what we know but also on how we get to know it. 

Establishing adequate goals for methods is in itself a sophisticated part of 
scientific research. With new areas of research especially, it usually takes time 
to figure out how certain methods can be used. One such comparatively new 
research field is the focus of the next section. 


3 Digital history 


In some instances, methods are influential enough to create scientific disciplines 
and subdisciplines around them. Digital history — a subdomain of history - is a 
case in point, but what is it about? In the following I present two proposals for 
defining the field. 

We can define digital history in a first approximation as the historical sub- 
discipline concerned with the use of digital methods to study the past.” Digital 
methods used in digital history are dependent on computers and their various 
capacities, such as the performance of computations and the processing and 
storage of data. This definition presents digital history as an area characterized 
by the application of certain computational/digital techniques. It is not unusual 
to describe a historical subdiscipline in this way. Oral history, for example, is 
characterized by its focus on the acquisition and use of certain sources and not 
by a specific topic.’® 


16 Jörn Rüsen, Historik: Theorie der Geschichtswissenschaft (Cologne: Böhlau Verlag, 2013), 55. 
17 For a recent review of the state of the digital history subdomain see Annemieke Romein 
et al., “State of the Field: Digital History,” History 105, no. 365 (2020): 291-312. 

18 Donald A. Ritchie, Doing Oral History, 3rd ed., Oxford Oral History Series (New York: Oxford 
University Press, 2015), xiv. 
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Unfortunately, this simple definition has the drawback that it is too broad to 
be very useful. It would make every historian a digital historian because the use 
of computers has permeated the academic landscape more or less completely. To 
characterize the whole of academic history as digital history would run counter 
to our desire to delineate an area within history in which the use of computa- 
tional techniques has taken on a special role distinct from the everyday uses of 
those techniques.’ 

If we are interested in getting a better understanding of digital history on a 
theoretical level we have to further specify how the computer is used by digital 
historians. My second definition characterizes digital history not only by its use 
of computers but by the fact that this work could not be done without computers. 
In this sense, the digital historian is a historian whose work would not be possi- 
ble without the help of a computer.”° This also means that the computer plays a 
special role in the justification of the claims in this area. The definition could 
therefore be rephrased as: digital history is the historical subdiscipline in which 
a certain kind of knowledge of the computer as a tool to justify historical claims 
is indispensable. This definition has the advantage of capturing our intuitive feel- 
ing that not every use of the computer has the same importance for the outcome 
of our research. Using digitized pictures of historical events can be important, 
but our knowledge of the computer we use plays a comparatively minor role in 
the claims we make with the help of those pictures. But if we use a database to 
store and query a large number of pictures or other data, knowledge of how the 
query works is indispensable for the reliable use of the technique.” 

Working with large amounts of data, and the sophisticated representation 
and visualization of these data with the help of automated algorithms, fall 
within this second definition. Given the comparatively recent origin of the field 
of digital history, this list of methods is not fixed - neither is it foreseeable 
which methods will be permanently established within history.” But there are, 


19 I assume that the establishment of specialized online platforms like https://ranke2.uni.lu/ 
and https://programminghistorian.org/ for teaching the application of the computer as a re- 
search tool, along with the establishment of specialized journals and research centers, is a 
manifestation of the process in which the computer has taken on this role. 

20 Here, “would not be possible without the help of a computer,” should be interpreted more 
in practical terms: it is of course imaginable in theory that, given enough time and resources, 
humans could perform the tasks of computers, but it is clear that this is not possible in 
practice. 

21 This does not mean that everything about a tool has to be known in order to use it, but 
rather that, for certain uses, some sort of basic understanding is necessary. 

22 Romein et al., “State of the Field,” 310. 
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nonetheless, clear examples of the application of digital methods extending the 
horizons of traditional historical research. 

I should also add another clarification. Digital history, although it involves 
the use of a computer, is not limited to computational methods. This is impor- 
tant because, in most cases, we see that computational methods are embedded 
in a web of other research activities. In Section 2, I argued that methods are 
directed toward certain goals. What about the goals of digital history? This 
question can only be answered by looking at specific methods. I will therefore 
look at two well-established methods in this area: topic modeling and social 
network analysis. At first sight, it may seem that digital history, because it is 
defined through its methods rather than through its goals, is directed toward 
the traditional goals of history. I also mentioned in Section 2 that different 
methods can be used to achieve the same goals - therefore a change in meth- 
ods does not necessarily imply a change in goals. But changed methods cer- 
tainly create the possibility for the consideration of new goals. We see this 
clearly with my first example of a computational method - topic modeling. 


3.1 Topic modeling 


The computational study of text corpora was one of the first applications in the 
humanities to use the calculating power of modern computing machines.” 
Nowadays, machine learning techniques such as topic modeling have become 
an attractive method for studying large amounts of textual data. Because of the 
highly structured way in which text is available, it is comparatively easy to 
transfer text documents into machine-readable form, thereby making the proc- 
essing of large amounts of text possible.“ The early use of computational text 
analysis coincided with the traditional role of text as the primary form of evi- 
dence in the humanities. The reading of a text provides humans with informa- 
tion that goes beyond the perception of markings on a page. The traditional 
way of describing this feature of language is that words and sentences have se- 
mantic meaning. A sentence can provide information about the intentions, be- 
liefs, and desires of an author, and can constitute evidence if we are interested 
in exploring those things. A text, as the manifestation of the writing behavior of 


23 Susan Hockney, “The History of Humanities Computing,” in Companion to Digital Humani- 
ties, ed. Susan Schreibman, Ray Siemens, and John Unsworth (Oxford: Blackwell, 2004), 3-19. 
24 For other text-based methods and natural language processing techniques see chapter 3 of 
Shohreh Haddadan, chapter 4 of Ekaterina Kamlovskaya, and chapter 6 of Eva Andersen in 
this volume. 
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an author, can also provide us with information beyond the conscious mental 
state of the author, allowing us to interpret the writing as the outcome of the 
cultural practices, social relations, and power structures of the time. Given the 
fundamental interest of historians in questions such as why somebody acted 
the way they did, or how somebody experienced something, textual evidence 
and methods related to textual sources have been of prime importance in his- 
torical research. In contrast to the automated processing of text, this traditional 
form of reading is known as close reading. 

Topic modeling algorithms analyze text and calculate the probabilities for 
certain groups of words to co-occur.” The assumption here is that words that 
occur together share a semantic relationship. It is intuitively plausible that if 
the words “garden,” “flower,” and “earth” appear together in a text, there also 
exists a semantic relationship between them. 

A well-known example of this is Robert K. Nelson’s Mining the Dispatch proj- 
ect.” Nelson used topic modeling to mine a large number of fugitive slave adver- 
tisements from the Daily Dispatch newspaper of Richmond, Virginia in order to 
explore the changes of topic over time. A topic like military recruitment was iden- 
tified by words like “service,” “men,” “company,” “arms,” “state,” “companies,” 
“Virginia,” “war,” and so on.” In this way, it was possible to discover some of 
the unexpected aspects of these ads, such as humor.” More recent applications 
of topic modeling have operated in a similar way and have shed new light on 
large-scale cultural developments in areas like the history of science, economics, 
and music production.” 

Topic modeling assumes that the probability of words occurring together in 
a text is an expression of a semantic relationship. In practice, this is not always 
the case. Words may appear together by coincidence, without representing any 
semantic relationship. Before a text can be analyzed, words like “the,” “of,” 
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25 David Blei, Andrew Ng, and Jordan Michael, “Latent Dirichlet Allocation,” Journal of Ma- 
chine Learning Research 3 (2003): 993-1022. 

26 “Mining the Dispatch,” last modified November 2020, http://dsl.richmond.edu/dispatch/ 
pages/home. 

27 “Mining the Dispatch,” last modified November 2020, https://dsl.richmond.edu/dispatch/ 
topic/32. 

28 “Mining the Dispatch,” last modified November 2020, https://dsl.richmond.edu/dispatch/ 
introduction. 

29 Shawn Martin, “Topic Modeling and Textual Analysis of American Scientific Journals, 
1818-1922,” Current Research in Digital History 2 (2019); Lino Wehrheim, “Economic History 
Goes Digital: Topic Modeling the Journal of Economic History.” Cliometrica 13, no. 1 (2019): 
83-125; and Matthias Mauch et al., “The Evolution of Popular Music: USA 1960-2010,” Royal 
Society Open Science 2, no. 5 (2015): 1-10. 
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and “to” have to be removed because they are less significant in determining 
the semantic topics in a text. In cases of words with little semantic value it is 
necessary for the researcher to manually distinguish significant from non- 
significant results.*° 

Historians have for a long time been interested in large-scale developments 
such as the changes in public opinion in a country or the existence of certain 
cultural practices over long periods of time. Without quantitative methods, argu- 
ments of this scope cannot be justified, as the cognitive abilities of humans are 
limited.” In such cases, automated procedures are needed to help researchers — 
and topic modeling can be seen as an auxiliary tool for automatically finding cer- 
tain semantic correlations in texts. But, by providing new methods, computa- 
tional tools also create new goals and transform older ones. The goal of training 
a machine learning model on a large corpus of texts in order to detect topics did 
not exist in the analog era. And a formerly unfeasible goal, such as the large- 
scale description of more major cultural developments — which at the same time 
captures at least some aspects of the outcome of cultural practices like writing 
texts — becomes much more tractable than with traditional methods. 


3.2 Social network analysis 


Social network analysis quantitatively describes the connections between dif- 
ferent entities within a network. It thereby provides the possibility of represent- 
ing certain relationships according to the rules of graph theory, the mathematical 
subdiscipline whose rigorous framework can be used to formulate explicit def- 
initions about the constituents of a network.” A graph consists of a set of ver- 
tices (nodes) with lines (edges) between those vertices. Different kinds of 
centrality measures can be used to describe and visualize how the nodes rep- 
resenting the entities in the network are related. The formal representation of 
the relationship between different entities within a social network makes it 


30 Matthew L. Jockers and Rosamond Thalken, Text Analysis with R: For Students of Litera- 
ture, Quantitative Methods in the Humanities and Social Sciences (Cham: Springer Interna- 
tional Publishing, 2020), 230. 

31 Paul Humphreys, Extending Ourselves: Computational Science, Empiricism, and Scientific 
Method (New York: Oxford University Press, 2004), 6. 

32 For more details about social network analysis see chapter 1 of Antonio Fiscarelli in this 
volume. See also Garry Robins, Doing Social Network Research: Network-based Research Design 
for Social Scientists (London: Sage, 2015). 
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possible to use automated algorithms to explore the properties of the network 
or to visualize it. 

The entities represented in this way do not have to be individual humans but 
can also include words, institutions, material things, and so on. The only impor- 
tant thing is that the relationship between the entities can be expressed in math- 
ematical form. The example in Fig. 1 shows the relationship between a set of 
school pupils. Every circle represents a child in a school class. The data were col- 
lected by Johannes Delitsch, a pioneer of social network analysis, in the 1880s. 
Based on his observations in class he created a table of friendship gestures, recip- 
rocal relationships, and other measures. The nodes in this figure were ordered 
according to the degree of connectedness to other nodes, with the nodes with the 
highest degrees of connectedness shown in the middle and colored in darker 
blue than the rest of the nodes.” 

Network analysis is most applicable in cases where we assume we will find 
significant relationships between entities. The relationships which constitute a 
network are not only idle ways to describe individual facts but rather can be 
used to explain certain effects that are dependent on the existence of a network. 
These effects might include the spread of information, or a disease, as well as the 
likelihood that certain events will take place. Traditional history often depends 
on narratives as a main tool to represent the past. Network analysis extends the 
toolbox of possible representations of the past by use of a formally rigorous the- 
ory of networks. The interpretation of what exactly is represented with the help 
of networks is dependent on the historian. As in the case of topic modeling, the 
ability to automatically create and visualize networks from large datasets allows 
historians to bring new details into focus. Describing centrality, for example, is 
an easy way to formulate hypotheses about a social actor in a network. 


33 Richard Heidler et al., “Relationship Patterns in the 19th Century: The Friendship Network 
in a German Boys’ School Class from 1880 to 1881 Revisited,” Social Networks 37, no. 1 (2014): 
1-1. 

34 The original data used to create this network were compiled by the German primary school 
teacher Johannes Delitsch. Between 1880 and 1881, he observed the behavior of his pupils’ 
school class. I created the picture with the Gephi graph visualization software. Data: https:// 
github.com/gephi/gephi/wiki/Datasets. For the use of Gephi, see Bastian Mathieu, Sebastien 
Heymann, and Mathieu Jacomy, “Gephi: An Open Source Software for Exploring and Manipu- 
lating Networks” (Third International Conference on Weblogs and Social Media, ICWSM, San 
Jose, USA, 2009). 
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Fig. 1: Visualization of a friendship network between schoolchildren, created with Gephi. The 
surnames of children have, except for pupil Pfeil, been abbreviated to the first letter of their 
surname. The color densities indicate the number of gestures of friendship Delitsch observed; 
the arrows show whether the friendship was reciprocated or not reciprocated. 2019. 

© Thomas Durlacher. 


3.3 The pitfalls of computational methods 


Of course, digital (or computational) methods are not without their issues. So, 
what are some of these problematic features? I want to outline two such features 
which are especially relevant for the epistemic function of digital methods in the 
research process. The first concerns the plasticity of computational representa- 
tions. Once data are in a machine-readable binary format it is easy to make 
changes to them and manipulate them. The way we visualize and represent 
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something with the help of a computer, although it can be expressed explicitly, is 
also prone to being modified. This has to do with the flexibility of a computer as a 
universal computing machine.” In the examples I mentioned in Sections 3.1 and 
3.2 it is easy to see that the careless interpretation of the output of computational 
methods could undermine the usefulness of the method. Personal biases, as well 
as a lack of understanding of what the computer is doing, magnify the problem. 

The second problematic feature of computational methods I want to discuss 
here concerns epistemic opacity.” Computer programs often present themselves 
as black boxes where only the input and the output are accessible to the re- 
searcher. In the context of scientific research, this feature has been called episte- 
mic opacity because the complex and autonomous structure of the programs 
used obscures the epistemic role that different program parts play. For non- 
epistemic tasks this is not problematic, because only the result, and not how it 
was generated, counts. In Section 2, I mentioned that methods and procedures 
can have an epistemic role. If it is not clear what is happening during a proce- 
dure then we do not know how it supports our claims. Therefore it is of great 
importance for historians to extend their critical methods and understand those 
parts of programs and algorithms that are relevant for their knowledge claims. In 
the context of digital hermeneutics, this task has been described as a continuous 
process that accompanies every step in the research process. Algorithm criticism, 
digital source criticism, tool criticism, interface criticism — all are part of a meth- 
odological reflective process aimed at ensuring the reliability of the methods we 
use.” In cases of novel techniques imported from other disciplines, this reflective 
process will be supported by experiments to reveal possible applications in the 
research process. 

Plasticity, like epistemic opacity, is connected with the strengths of computa- 
tional methods, automated processing, and rule-based representational techni- 
ques. In topic modeling, as with social network analysis, both of these features 
can undermine the results of our research. In the case of topic modeling, the ma- 
chine learning algorithm searches for probabilities between words, but when we 
look at the results alone it is not immediately clear how they were generated. Im- 
portant decisions have to be made by the researcher, the number of topics has to 
be chosen, and parameters configured. This makes the topic modeling method 


35 Johannes Lenhard, Calculated Surprises: A Philosophy of Computer Simulation (New York: 
Oxford University Press, 2019), 10. 

36 Humphreys, Extending Ourselves, 147. 

37 Andreas Fickers, “Update fiir die Hermeneutik. Geschichtswissenschaft auf dem Weg zur 
digitalen Forensik?,” Zeithistorische Forschungen — Studies in Contemporary History 17, no. 1 
(2020): 157-68. 
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susceptible to being fitted toward a preferred outcome. To a certain degree, this 
may be true for all methods, but the novelty and lack of well-established stand- 
ards is especially worrisome in the case of computational methods which have 
not been used in historical research before. 

When it comes to social network analysis, these problems mainly appear in 
the ways in which networks are visualized. Automated algorithms are often 
used to bring networks into a visually appealing form. Here too, the way the 
network is presented often remains a mystery to the user. One way to counter- 
act such problems is to reverse engineer the results and try to independently 
confirm that an outcome is meaningful and not just the artifact of an algo- 
rithm.” This requires time and resources but is of great epistemic importance 
with regard to the role that methods play in the research process. 


4 Conclusion 


Methods play a central role in academic research. Because of their importance, 
reflection on methods and their evaluation — from the perspective of historians 
as well as those collaborating with them - is critical to ensure that research is a 
systematic enterprise. For history, this is important for its internal, as well as its 
public, accountability. The evaluation of methods depends crucially on the 
goals those methods are directed toward, which are themselves part of an intri- 
cate web of goals and values in a discipline. A lower-level goal like the repre- 
sentation of a social network, or the automated detection of topics in a text 
corpus, does not always fit into the web of the higher-level goals of a research 
project or, on an even higher level, a discipline. When a research project is, for 
example, purely focused on individuals, it has to be argued how or whether 
these methods, usually aimed at the analysis of macrostructures, will contrib- 
ute to the purpose of the project. Some hints of how this is possible have been 
given in Sections 3.1 and 3.2. Of course, the evaluation of these methods will 
not always result in a positive conclusion. The introduction of new methods 
also needs to be accompanied by discussion and reflection on the ways these 
methods can be integrated into and used in a discipline. Many of the chapters 
collected in this volume provide examples of this process and give a good ac- 
count of how such developments are currently shaping digital history. 


38 Juan M. Duran and Nico Formanek, “Grounds for Trust: Essential Epistemic Opacity and 
Computational Reliabilism,” Minds and Machines 28, no. 4 (2018): 645-66. 
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In the field of history, the most recent methodological innovations in the 
form of computational techniques also require the critical assessment of those 
methods to make sure they reliably serve the epistemic aims of historians. In 
the case of computational methods, I have pointed out two features of these 
methods that could, if ignored, undermine their epistemic function: i.e. their 
plasticity and epistemic opacity. 

Biases, lack of understanding, and unfeasible goals can be a detriment to 
research. This chapter can be understood as an invitation to critically compare 
the methods introduced by digital history with the general aims of the historical 
enterprise. In this regard, the cases of topic modeling and social network analy- 
sis are intended to show how computational techniques are related to the aims 
of history and how they can change our representations of the past. 
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II From ‘source’ to ‘data’ and back 


Eva Andersen 
From search to digital search 
An exploration through the transnational history of psychiatry 


1 Introduction 


Historians are trained to critically interpret the past. To do this they are in- 
structed in a variety of archival and writing skills, as well as critical thinking, 
and are taught a research workflow in which searching for primary sources, 
verifying their authenticity, and undertaking close reading to understand, ana- 
lyze, and interpret them, are all fundamental to writing critical and comprehen- 
sive reflections of past events. 

Although, for historians, search is just one aspect of their repertoire, it is a 
vital skill and one in which they become very efficient. Historians have always 
relied on this competency: searching for people (e.g. librarians and archivists) or 
within archive catalogs to direct them to relevant material, as well as searching 
or browsing through primary sources in order to find useful information. Without 
search, there are no sources or text passages to be read and interpreted. Of 
course, the term “research” itself is a derivative of the word “search.”' In other 
words, search is where all historical investigations start - hence why search - as 
opposed to other skills of the historian such as analyzing and interpreting - is 
central to this chapter. 

Since the advent of digital history and digital humanities in the late twenti- 
eth century, the historian’s “traditional” workflow has often been juxtaposed 
with these “new” forms of research practices and tools. Digital history has be- 
come associated with terms such as big data, algorithms, programming lan- 
guages, text mining, topic modeling, network analysis, etc. Even now, some 
humanities scholars can be wary of incorporating certain approaches promoted 
within digital humanities. They have come to associate digital history with 


1 “Definition of Research,” Merriam Webster, accessed September 19, 2020, https://www.mer 
riam-webster.com/dictionary/research. 


Note: Part of this chapter is based on earlier blogposts as well as a draft version of the meth- 
odology chapter of my dissertation. | want to thank Lincoln Mullen (George Mason University) 
and Matteo Romanello (Ecole polytechnique fédérale de Lausanne) for their useful and con- 
structive feedback; my colleagues at the University of Luxembourg, Maria Biryukov, Lars Wie- 
neke, and Roman Kalyakin for the fruitful collaboration described in Section 5.2., and the 
Luxembourg National Research Fund (FNR) (10929115), who funded my research. 


8 Open Access. © 2022 Eva Andersen, published by De Gruyter. |<) Fxg] This work is licensed under 
the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. 
https://doi.org/10.1515/9783110723991-007 
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“black boxes,” uncritical research outcomes, and computational approaches 
that replace or downplay the defining research skills of historians. However, 
this does not have to be the case. 

At the same time, within digital history the concept of search has become an 
even more crucial feature of the historian’s research repertoire and workflow, es- 
pecially when big data is involved. Far from supplanting historians’ original re- 
search practices, digital history can provide additional or extended forms of 
search to aid historians in the exploration and analysis of source material. Digital 
tools and approaches do not have to be something utterly foreign: text mining, 
topic modeling, and other derivates are in essence “different species of search.”? 

In the following paragraphs I want to challenge the reader to think about 
how essential search is, what the benefits and drawbacks of different digital 
search tactics are, and what the future of digital search might involve. I draw 
from my own educational background as a “traditionally schooled” historian, 
experimenting with digital sources and tools that enable me to use digital 
search in order to analyze the history of psychiatry from a transnational point 
of view. To contextualize this, I first explain my research project, as well as the 
meaning of search itself. I then focus on the different stages of the historian’s 
research process — including searching for sources in order to find material 
with which to answer research questions, searching for tools in order to man- 
age the exploration of large data sets of collected sources, and searching for 
relevant content within our sources that will facilitate reading, analysis, and 
interpretation — and the importance of search in each of these. 


2 Search: A means to an end 


During my time as an MA history student I developed an interest in the history 
of psychiatry and transnational history.’ For my PhD project I wanted to pursue 
these domains on a bigger scale than I had done before* — to leave behind any 
form of nationally contained histories and instead investigate how psychiatric 


2 Statement by Lincoln Mullen, Associate Professor at George Masson University, online 
meeting June 25, 2020. 

3 For a basic introduction to both these subjects see: Andrew Scull, Madness in Civilization: A 
Cultural History of Insanity, from the Bible to Freud, from the Madhouse to Modern Medicine 
(London: Thames & Hudson, 2015); Akira Iriye and Pierre-Yves Saunier, The Palgrave Dictio- 
nary of Transnational History (Basingstoke: Palgrave Macmillan, 2009). 

4 Eva Andersen, “De Société de Médecine Mentale de Belgique in Transnationaal Perspectief 
(1869-1900),” Belgisch Tijdschrift Voor Nieuwste Geschiedenis XLVII, no. 4 (2017): 50-82. 
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knowledge had circulated throughout Europe during the nineteenth and early 
twentieth centuries.’ To answer my research questions I studied the main psychi- 
atric journal of each of five different countries — Belgium, France, the Nether- 
lands, Germany, and the United Kingdom - between 1843 and 1925 (an 82-year 
period). Together, these sources amounted to a substantial corpus of over 460 
volumes and more than 300,000 pages to investigate.’ 

Scale is undoubtedly one of the main challenges with transnational re- 
search. Digital history, and more specifically digital search, seemed at first 
sight to offer easy solutions to this problem P The transnational and digital 
turns are becoming more and more intertwined due to source digitization 
which facilitates virtual cross-border research, as well as the growing possibili- 
ties of the search box which make (transnational) research possible at a pace 
and range that was not feasible before.” As Putnam aptly states, “Digital search 
has become the unacknowledged handmaiden of transnational history.” 


5 Some key questions in my research are: How did the transnational sphere influence thinking 
about and the reception of psychiatric concepts and practices? What kinds of negotiations 
took place in psychiatric circles before certain information or ideas were deemed important, 
true, false, or even useless? Why did or didn’t knowledge transfers succeed?. 

6 The reason I chose these countries was mostly a pragmatic one due to my knowledge of the 
languages spoken in these countries, but also already being familiar to some extent with the 
history of psychiatry in these specific countries. The reason for choosing this timeframe was 
based on two parameters. Firstly, 1843 is the earliest date that an issue of one of the journals 
under study appeared. Then 1925 was chosen as an end date because the journals were harder 
to acquire in a digital format after this date. The journals under study were: Bulletin de la Soci- 
été de Médecine Mentale de Belgique (Belgium), Annales de la Société Medico-Psychologiques 
(France), Psychiatrische Bladen (the Netherlands), Allgemeine Zeitschrift fiir Psychiatrie und 
psychisch-gerichtliche Medicin (Germany), and Journal of Mental Science (the United King- 
dom) - although their titles have changed over time. 

7 Referring to my sources as a “substantial corpus” is of course relative in terms of the amount 
of data that is used in other historical research, such as the newspaper Impresso project, 
which contains over 5,445,822 pages (see: Matteo Romanello and Maud Ehrmann, “What’s in 
Our Corpus?,” impresso blog, January 23, 2020, accessed August 2, 2021, https://impresso- 
project.ch/news/2020/01/23/state-corpus-january2020.html) or the amount of data that is 
often used in computational sciences. 

8 This does not mean that transnational research did not exist before the introduction of the 
search box and digital repositories, but it was much more time-consuming to write and expen- 
sive to investigate, often resulting in specifically selected examples rather than trying to study 
a particular subject in its entirety. Lara Putnam, “The Transnational and the Text-Searchable: 
Digitized Sources and the Shadows They Cast,” The American Historical Review 121, no. 2 
(2016): 382-3 and 394. 

9 Putnam, “The Transnational and the Text-Searchable,” 377 and 380. 

10 Putnam, “The Transnational and the Text-Searchable,” 377. 
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“Search” is not an end goal; it is always a means to an end regardless of 
whether we are talking about search in its analog or its digital form. The former 
consists in most cases of skim-reading page after page until a certain title, pas- 
sage, phrase, or word catches our eye - often almost as if by accident - in our 
search for relevant material." Depending on the type of source this can also be 
facilitated by searching through physical tables of contents or indexes. In es- 
sence this is what can be called a top-down approach to searching, whereas 
digital search applies a bottom-up approach dominated by the search box.” 

But if digital search is a bottom-up approach, doesn’t that mean that it is 
something different from what historians are taught? Yes and no. Instead of 
phrases or words catching our eye as we read for hours, they now come to us 
almost instantaneously via digital search.” The reason our eyes pick up on cer- 
tain passages within a source when browsing manually is because we con- 
sciously or unconsciously build a list of words in our minds around the topic 
we are studying. For example, to explore the use of mind-altering substances in 
psychiatry we would pay attention to words like alcohol, morphine, addiction, 
or wine. When we search digitally, we still use our same background knowl- 
edge and word lists regarding this topic, only now we enter them into a digital 
interface. Suddenly, digital search seems less alien. 

On the other hand, there are certain aspects of digital search that we need 
to be careful about, although through critical reflection and transparency po- 
tential issues can be mitigated. When it comes to historical research, one of its 
challenges and even dangers is its seeming simplicity. We all know the search 
box and use it daily, either in our personal lives or for our professional activi- 
ties. Most of the time we do not think about how we use it or how it works, and 
do not take into account the variety of ways in which digital search can trigger 
different or skewed results - especially arising from the many forms that search 
and the search box can take." 


11 Regarding the difference between manual browsing and digital searching, see Bob Nichol- 
son, “The Digital Turn,” Media History 19, no. 1 (2013): 59-73; Hieke Huistra and Bram Mellink, 
“Phrasing History: Selecting Sources in Digital Repositories,” Historical Methods: A Journal of 
Quantitative and Interdisciplinary History 49, no. 4 (2016): 220-9; and Adrian Bingham, “The 
Digitization of Newspaper Archives: Opportunities and Challenges for Historians,” Twentieth 
Century British History 21, no. 2 (2010): 225-31. 

12 Nicholson, “The Digital Turn,” 66-7. 

13 This is not to say that this is without its problems. Many scholars have warned about the loss of 
context in these cases and the idea that what is in fact scarce now looks prominent or abundant. 
14 Tim Hitchcock, for example, has warned scholars about this on multiple occasions. See, for 
instance, “Lecture Tim Hitchcock - Beyond Close and Distant Reading: Recording and Inter- 
view,” June 18, 2019, accessed April 13, 2021, https://www.c2dh.uni.lu/data/lecture-tim- 
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3 Searching for the perfect digitized source 


When we apply digital search, the first phase in which we do this is while 
searching for relevant digital sources. This often means searching with the Goo- 
gle search engine to, in my case, find different journals to investigate, or do a 
keyword search within the digital repositories of archives and libraries. 

But keyword search is more than just typing words in a search bar. Depend- 
ing on the platform, a variety of options can be offered. These can include 
“basic” options such as introducing a date range, or placing limits on titles, 
genres, source types, or places of publication, as well as using Boolean opera- 
tors (such as “AND,” “OR,” and “NOT”) between keywords or using multiword 
expressions.” But there are many other forms of search: fuzzy search, proxim- 
ity search, the use of wildcards and query auto-complete options. These types 
of search can be further optimized and improved via the use of correction after 
the optical character recognition (OCR) process (post-OCR correction), named 
entity recognition, entity linking, sentiment analysis, or topic modeling.” Many 
of these more advanced features are less integrated into the interfaces of online 
repositories and, if they are present, are often hidden.'® 

Because we are talking about searching for and identifying digital sources 
for our research it is also important to reflect on the digital sources themselves 
as, aside from the algorithms applied in a search environment, their quality has 
a tremendous impact on search functionality — as well as on the displayed re- 
sults we will later have at our disposal. This is as true for the initial task of lo- 
cating digital sources as for the application of search tactics within sources. 

The quality and accuracy of a digital source is determined by the factors that 
help transform the analog source into a digital version. In this regard, Owens 
made the accurate observation that “all digitized objects are surrogates for the 
originals.” ° This transformation process can be captured in three stages: scanning 


hitchcock-beyond-close-and-distant-reading-recording-and-interview (“Recording of the con- 
ference” — see especially from 13 minutes 13 seconds onward). 

15 Maud Ehrmann, Estelle Bunout, and Marten Diiring, “Historical Newspaper User Interfaces: 
A Review” (IFLA WLIC, Athens, 2019), 12, accessed June 17, 2021, http://library.ifla.org/2578/1/ 
085-ehrmann-en.pdf. 

16 Ehrmann, Bunout, and Diiring, “Historical Newspaper User Interfaces,” 12. 

17 Ehrmann, Bunout, and Diiring, “Historical Newspaper User Interfaces,” 14. 

18 Ehrmann, Bunout, and Diiring, “Historical Newspaper User Interfaces,” 12. 

19 Trevor Owens, “Digital Sources & Digital Archives: The Evidentiary Basis of Digital His- 
tory”. User centered digital memory blog, December 5, 2021, accessed August 2, 2021, http:// 
www.trevorowens.org/2015/12/digital-sources-digital-archives-the-evidentiary-basis-of-digital- 
history-draft/. 
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the source; optimizing the source to enable more and better search functionalities; 
and the online consultation or downloading of sources. 

Firstly, scans can be made by high definition cameras, (semi)automatic book 
scanners or overhead scanners (with or without the use of a V-shaped book cra- 
dle). All make the digital version of a source somewhat disparate from its original 
and can lead to visual and analytical discrepancies between the original and its 
digital copy, as well as between digital versions. This can have a lasting impact 
on the different search capacities that can be integrated. The severity of this im- 
pact depends on the accuracy and completeness of a source (missing, skewed, or 
badly scanned pages), its readability by humans and machines, and its aesthetics 
and visual representation (e.g. the difference between black-and-white, gray- 
scale, or multitone scans, or the (dis)use of thumb-removal software).?° 

In a second stage, the sources are optimized to maximize the search func- 
tionalities. The three most important processes we find here are: creating single 
pages out of double scanned pages, which improves the OCR accuracy; apply- 
ing OCR; and applying post-OCR corrections through software. Much can be 
said about OCR software and protocols, but what is important to note is that 
(re)search with digitized sources relies tremendously on the recognition of let- 
ters and words within a corpus. When sources are not properly optimized this 
can lead to discrepancies between the material that can be found within digital 
repositories and the search hits within a source. The impact that poor scanning 
can have on the readability of the source by a machine (due to inaccurate OCR), 
but also that the source becomes almost unreadable for the researcher too, 
making even analog search within this digital source less efficient. 

Lastly, these digital sources are stored on personal hard drives or the serv- 
ers of archives and libraries and, in the case of the latter two, made accessible 
via an online repository. Depending on how carefully the previous steps were 
carried out, digital sources can be found more or less easily. There are still 
some online platforms that do not apply OCR when scanning their source 


20 Arindam Chaudhuri et al., “Optical Character Recognition Systems,” in Optical Character 
Recognition Systems for Different Languages with Soft Computing, ed. Arindam Chaudhuri 
et al., Studies in Fuzziness and Soft Computing, vol. 352 (Cham: Springer International Pub- 
lishing, 2017), 9-41; Simon Tanner, Trevor Muñoz, and Dich Hemy Ros, “Measuring Mass Text 
Digitization Quality and Usefulness: Lessons Learned from Assessing the OCR Accuracy of the 
British Library’s 19th Century Online Newspaper Archive,” D-Lib Magazine 15, no. 7/8 (2009); 
Maya R. Gupta, Nathaniel P. Jacobson, and Eric K. Garcia, “OCR Binarization and Image Pre- 
Processing for Searching Historical Documents,” Pattern Recognition 40, no. 2 (2007): 389-97; 
and Rose Holley, “How Good Can It Get?: Analysing and Improving OCR Accuracy in Large Scale 
Historic Newspaper Digitisation Programs,” D-Lib Magazine 15, no. 3/4 (2009), accessed June 17, 
2021, http://www.dlib.org/dlib/march09/holley/O3holley.html. 
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material, making search possible only via the metadata (e.g. title, year) that are 
provided by the institution that stores them. 

Due to the mass digitization of sources, different copies of a single source 
can be found on the internet. This becomes especially visible in repositories such 
as HathiTrust and Archive.org and can potentially lead to different research re- 
sults, depending on the accuracy and quality of each of these copies, which in 
turn determine the degree of search that is possible. It is not always clear which 
copies are better and should be preferred over other digitized copies. There are 
often no clear ways to notify providers about the discrepancies that are some- 
times found within a digitized source either — nor to ask them to rectify this. 

This whole transformation process lays bare two important shortcomings of 
digitized corpora: the historian’s dependence on the input and diligence of 
others in their search for sources, and the efficiency of search and search re- 
sults. We often rely on third-party data providers such as libraries or online ar- 
chives - and the companies (e.g. Google) they work with - in order to deliver 
and provide complete and well-scanned historical material. Where it was previ- 
ously just the historian and stacks of physical sources under their control, there 
is now an intermediator standing between the historian and the sources in the 
form of those who scan and provide the material, as well as the machines used 
to make those scans. Of course, some mediation also takes place between the 
historian and the archivist, as the latter often makes a selection of which docu- 
ments are preserved and which are not. Likewise, intermediation has in some 
cases also become less extreme. This is, for example, noticeable in the online 
access of archival catalogues. 


4 Searching for tools: A process of trial and error 


“Searching for tools” does not mean the same as searching for sources in repos- 
itories, developing search tactics, or exploring search results. Nevertheless, 
searching for suitable tools is important when we want to apply digital search. 
Many of the search functionalities mentioned earlier are also found in stand- 
alone (re)search tools. The range of possibilities, algorithms, and online and 
standalone tools that offer all or some of these functionalities seems almost 
endless.” However, the internal mechanisms and modi operandi of these tools 


21 For a broad introductory overview see, for example: Shawn Graham, Ian Milligan, and 
Scott Weingart, Exploring Big Historical Data: The Historian’s Macroscope (London: Imperial 
College Press, 2016). 
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are not always explained, or they are difficult to use and understand for inexpe- 
rienced users who are unfamiliar with this multitude of search functionalities 
and thus not able to use them properly. 

This takes us to the dreaded “black box” of the digital humanities, which 
can become problematic during the use of digital search if we are not transparent 
about — and careful and consistent with — the research practices applied. This 
amalgam of tools is not necessarily a one-size-fits-all solution for each and every 
research project, although that can be a common misconception. Below I outline 
some of the tools I experimented with to find a form of digital search that fitted 
my project and research questions, and my research workflow as a historian — a 
tool that was also understandable to me. Corpus linguistics, text mining, and 
more specifically keyword search and topic modeling, were the search practices 
and techniques I used to digitally search for relevant content in order to be able 
to analyze the circulation of psychiatric knowledge later on in my research. The 
search tools I explored were Voyant Tools, MALLET, AntConc, and histograph. 

Voyant Tools is a “web-based reading and analysis environment for digital 
texts”” that is “designed to facilitate reading and interpretive practices for digi- 
tal humanities students and scholars as well as for the general public.”” It is 
an example of text mining via “simple” keyword search — but, although it has 
an important goal in mind, I didn’t find the tool suitable for my own research. 

First of all, the web application often reacted extremely slowly or crashed 
due to the large amount of data I had.” Secondly, as with many other search 
and explorative tools, Voyant only shows plain text versions of the data, while 
it could also be valuable to see the original scans next to these. Thirdly, al- 
though Voyant Tools offers 28 different ways to explore and search through a 
corpus, this breadth of options is overwhelming for a beginner. Furthermore, 
not all sub-tools allow the user to switch between the visualization and the text 
file. Lastly, many of the visualization options — let’s call them “visual search” op- 
tions — are nice to look at but, for detailed search tactics and source analysis, they 
will scarcely provide the researcher with the information they are looking for 
(Fig. 1). This is a problem very common with visualizations in the humanities, as 
“beautiful” graphs are often bad representations of data or easily open to 
misinterpretation. 


22 Stéfan Sinclair and Geoffrey Rockwell, “Voyant Tools,” accessed September 19, 2020, 
https://voyant-tools.org/. 

23 Stefan Sinclair and Geoffrey Rockwell, “About - Voyant Tools Help,” accessed September 19, 
2020, https://voyant-tools.org/docs/#/guide/about. 

24 Although running Voyant Tools on your own computer is possible, I was not aware of this 
feature before I abandoned this tool. 
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Aside from the use of keyword search, where a certain amount of back- 
ground knowledge is required from the researcher, another tool that allows dig- 
ital search is topic modeling. In a very simplistic manner, a computer algorithm 
tells the researcher which topics or subjects are present in a certain corpus. 
Topic modeling works as follows: a document (e.g. a book, journal issue, or ar- 
ticle) consists of a collection of words and via a statistical process the computer 
classifies these words into sets of words that occur frequently together, forming 
different topics in the process. 

One of the topic modeling tools that I briefly explored was the Topic Model- 
ing Tool (TMT), which provides a graphical user interface (GUI) for MALLET.” 
MALLET, which is used via the command line, is a topic modeling algorithm 
that is very frequently used within the humanities.” I only conducted a few ex- 
periments with the GUI for MALLET because the tool represented a black box 
for me. The exact mechanics behind the algorithms and the different settings 
and options that could be selected and implemented were not clear to me. In 
cases like this especially, continuing to use this kind of search tool would have 
led me to make errors in my analysis and conclusions later on. 

Aside from these explorations I also began to work with AntConc and histo- 
graph. Both tools were used extensively during my (re)search process. In Sec- 
tion 5 I highlight and explain the different digital search tactics I applied in 
order to show their drawbacks and benefits. 


5 Applying search tactics to locate what to read 


The main reason I made use of various different forms of search was to over- 
come the obstacle of the overabundance of source material that I had acquired. 
This overabundance was often problematic to my starting to perform a valuable 
and thorough analysis of my sources. Without computational support it was 
not only difficult to search for and provide answers to specific research ques- 
tions but also to, for example, locate interesting and useful subjects that could 
serve as case studies. 


25 Jonathan Scott Enderle, “Senderle/Topic-Modeling-Tool,” August 30, 2020, https://github. 
com/senderle/topic-modeling-tool. 

26 Other functionalities include: statistical natural language processing, document classifica- 
tion, clustering, information extraction, and other machine learning applications to text. See: 
“MALLET,” accessed September 19, 2020, http://mallet.cs.umass.edu/. 
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So how to search this goldmine of psychiatric journals? How could I har- 
ness digital search to get control over the sheer volume of my sources? These 
questions were a constant concern. Without a digital search tactic, I could not 
start to carry out this essential step of my research workflow as a historian. 
Without it I would not be able to close-read crucial parts of these journals for 
my analysis of psychiatric knowledge circulation across Europe. 

Historians Damerow and Wintergriin have demonstrated the importance of 
having full control over a corpus even within “a digital framework” because 
“historical research relies on trust in its sources.”” This trust in sources is a 
precarious balancing act for historians. How can we find and trace knowledge 
circulation in these substantial corpora? Where and how should we start dis- 
tant reading, and later on close reading? How can we find relevant information 
for close reading? How do we zoom in and out of the material? How accurate is 
the output of the search tools? These were some of the challenges I faced in 
seeking one or multiple search tactics. As will become clear in this section, key- 
word search would form a major part of my research tactics but would take on 
different forms. 


5.1 AntConc 


AntConc is an off-the-shelf application developed in 2014 by Laurence An- 
thony.”® Its goal is to make textual analysis and explorative research of text 
files easier and more manageable. The tool can create concordance tables, 
n-gram clusters, and collocations, among other outputs. To make my use of this 
tool more explicit, and to contextualize it within the scope of the transnational 
history of psychiatry, I framed my search exploration with AntConc around the 
non-restraint system that came into vogue during the nineteenth century. Dur- 
ing this time, debates were held about the (un)suitability of using mechanical 
restraints such as straitjackets and iron cuffs on patients.” Laying bare the 


27 Julia Damerow and Dirk Wintergriin, “The Hitchhiker’s Guide to Data in the History of 
Science,” Isis 110, no. 3 (2019): 513-21. 

28 Laurence Anthony, “AntConc Homepage,” accessed September 19, 2020, https://www.lau 
renceanthony.net/software/antconc/. 

29 Some were of the opinion that this formed a necessary part of therapy, as well as a practi- 
cal element necessary to keep control over a large number of patients. Others were of the opin- 
ion that this had no therapeutic value at all and that patients did not have to be confined in 
this manner, but instead should be able to walk around freely and allowed to enjoy the outside 
air, games, or working in the kitchen or gardens of the asylum. 
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non-restraint debate across time and space by manually combing through all 
the journals to search for relevant articles would have been an immensely time- 
consuming task. However, by using AntConc as a search tool, in combination 
with close reading, this became more feasible. 


5.1.1 Concordance plots and the vitality of keyword lists 


The main feature of AntConc that I used to search through my corpus was the 
concordance plot.” This component shows concordance results plotted in what 
Laurence Anthony calls a “barcode” format (Fig. 2).” This allows you to see the 
position of one or multiple search terms in different documents in an abstract 
representation. Each line in the barcode visualization (distant reading) is click- 
able and brings forward the uploaded text file for close reading, highlighting the 
selected keyword(s). In the case of my research about non-restraint I always 
worked with multiple keyword lists, as this search tactic made tracing relevant 
spots within the corpus (keyword clusters) easier and more consistent. Below I 
explain why this was the case and why it can be a useful search strategy. 

The building of keyword lists proved essential with this form of text mining. 
As a point of departure, I used the terms “non-restraint” and “mechanical re- 
straint,” two phrases that typified the core of the debate. However, this did not 
capture all places where the debate was mentioned. It is important that the re- 
searcher already has some understanding of the subject at hand in order to 
make decisions about the terms that will be included (e.g. knowing which 
terms were customary). In a second stage I added other keywords that repre- 
sented these restraint systems, such as “padded room” and “straitjacket.” 

Due to the use of corpora in multiple languages, I compiled a list of terms 
associated with restraint and non-restraint for each language. This was accom- 
plished by, on the one hand, translating already-known terms to other lan- 
guages, but also by alternating between distant and close reading: examination 
of specific sections within a source revealed variations of word use within each 
language. This is a strategy that has also been highlighted by Berridge et al.” 


30 Different features are explored in more detail by my colleague Jolien Gijbels and I in the fol- 
lowing blogpost: Jolien Gijbels and Eva Andersen, “AntConc, Historians and Their Diverging Re- 
search Methods,” Digital History & Hermeneutics blog, August 11, 2020, accessed April 13, 2021, 
https://dhh.uni.lu/2020/08/11/antconc-historians-and-their-diverging-research-methods/. 

31 See AntConc Help file at Anthony, “AntConc Homepage.” 

32 Alex Mold and Virginia Berridge, “Using Digitised Medical Journals in a Cross European 
Project on Addiction History,” Media History 25, no. 1 (2019): 85-99. 
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Furthermore, I took the different spelling variations of keywords, some due to 
OCR mistakes, into account where possible (e.g. no restraint, no-restraint, non- 
restraint, non“restraint). 

Compiling these keyword lists needs to be done thoroughly, as the creation 
of too limited or too generic or broad a list can create its own problems. Not 
considering one or multiple keywords can have an impact on the output results 
of search queries, potentially misleading the researcher. This became tangible 
while analyzing the German psychiatric journal. I had started out with a limited 
set of keywords for this particular language, due to my limited knowledge of 
German. But by translating some of the terms found in the other corpora while 
combining this with close and distant reading, relevant sections within the cor- 
pus were highlighted more distinctly and gave a more concrete image of rele- 
vant starting points for further corpus exploration (Fig. 2). 

While sparse keyword lists can miss relevant spots in a corpus, the use of 
words that are too generic can clutter the results and create a mass of data that 
is not easily processed, as other research has also shown.” To give an example: 
the French word “cellule” could either refer to an isolation cell or human cells. 
The word “restraint” could refer to non-restraint, mechanical restraint, or emo- 
tional/behavioral restraint, hence why I opted to use specific words that would 
not be ambiguous in their use (e.g. isolation cell and restraint absolu). I used 
the same search tactic for zooming in, gathering and extracting information 
about the editors and editorial decisions, or references to international confer- 
ences, from the journals. 

A key drawback of AntConc was that, although relevant sections became 
easier to spot, reading these sections was less straightforward due to its use of 
text files only: a representation of the source that does not correspond to the 
original from an aesthetic or visual point of view. Unstructured text files are not 
always efficiently readable for the human eye. In order to make close reading 
possible I was obliged to switch between AntConc’s visualization, the text files, 
and the original PDF documents of the sources - the latter to stay as close to 


33 Hinke Piersma and Kees Ribbens, “Digital Historical Research: Context, Concepts and the 
Need for Reflection,” BMGN - Low Countries Historical Review 128, no. 4 (2013): 78-102; Hieke 
Huistra, “Experts by Experience. Lay Users as Authorities in Slimming Remedy Advertise- 
ments, 1918-1939,” BMGN — Low Countries Historical Review 132, no. 1 (2017): 126-148; Virginia 
Berridge, Jennifer Walke, and Alex Mold, “From Inebriety to Addiction: Terminology and Con- 
cepts in the UK, 1860-1930,” The Social History of Alcohol and Drugs 28, no. 1 (2014): 88-105; 
and Virginia Berridge et al., “Addiction in Europe, 1860s-1960s Concepts and Responses in 
Italy, Poland, Austria, and the United Kingdom,” Contemporary Drug Problems 41, no. 4 
(2014): 551-66. 
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reading the original source as possible. This was a time-consuming workflow 
which would be improved by using the histograph web app. 

The search tactics with keyword lists used in AntConc require substantial 
background knowledge of a subject in order to thoroughly study it via the bar- 
code visualization. This therefore omits many other topics that stay hidden 
from the researcher. Via other search tactics (e.g. topic modeling), this can be 
overcome to a certain extent. 


5.2 Histograph 


Although the use of AntConc solved some problems, the sheer volume and di- 
versity of the corpus still posed challenges: How can a historian find and trace 
relevant information to analyze the evolution of specific ideas throughout large 
corpora? The use of off-the-shelf applications could only go so far. This stimu- 
lated a collaboration between me and my colleagues at the C?DH.°* This cooper- 
ation, which included major brainstorming sessions about data quality and 
inconsistently digitized corpora, as well as the nature of the research project, 
made sure that the search workflow could stay as close as possible to my own 
research process. 

As a result of this collaboration I did not have to adapt to the constraints of 
a specific tool, as is often the case, but rather vice versa. This approach simulta- 
neously provided me with a better understanding of the technical processes op- 
erating in the background, avoiding the black box effect. The result was a tool 
for corpus exploration modeled on an earlier version of histograph. Initially de- 
veloped to provide “graph-based exploration and crowd-based indexation for 
multimedia collections,” through which related documents could be discovered 
via filtering entities, date ranges, and document types, histograph also reveals 
relationships between people and keeps track of relevant documents.” My 


34 For a more detailed excursion into the necessity of collaboration between historians and 
computer scientists, as well as for all technical details about the processes and algorithms 
used, see: Eva Andersen et al., “How to Read the 52.000 Pages of the British Journal of Psychi- 
atry? A Collaborative Approach to Source Exploration,” Journal of Data Mining and Digital Hu- 
manities (HistoInformatics), 2020. 

35 “Histograph,” accessed January 5, 2021, http://histograph.eu/; Jasminko Novak et al., “His- 
toGraph: A Visualization Tool for Collaborative Analysis of Networks from Historical Social 
Multimedia Collections” (18th International Conference on Information Visualisation, Paris, 
2014), 241-50; Lars Wieneke et al., “Building the Social Graph of the History of European Inte- 
gration,” in Social Informatics, ed. Akiyo Nadamoto et al. (Berlin, Heidelberg: Springer, 2014), 
86-99; and Marten Diiring, Lars Wieneke, and Vincenzo Croce, “Interactive Networks for 
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colleagues adapted the first version of histograph to fit my particular research 
and sources — e.g. adding topic modeling and visualizations to maximize the 
search functionalities. 


5.2.1 Establishing an optimal way to search content 


As seen earlier, historical corpora are often unstructured and irregular due to 
poor OCR quality and textual errors, as well as the lack of a regular volume 
structure which, for example, makes the detection of individual articles within 
a corpus extremely difficult. Some preprocessing steps were required to make 
our exploration tool useful. Firstly, this included choosing a logical boundary 
unit - in this case, at the page level (one page = one document). Choosing this 
boundary also meant that the structure of the documents would be the same 
across the multiple corpora. A second preprocessing step involved removing 
stop words and maximizing the use of content-bearing pages. 

Instead of relying on concordance plots, collocations or n-grams, we used 
topic modeling to enable more control over the corpora and the search func- 
tionality. This enabled me to discover which topics were covered in the journal, 
where, and to what extent — which was important for being able to select rele- 
vant parts of the corpora for close reading. 

We used non-negative matrix factorization (NMF)* instead of the latent Di- 
richlet allocation (LDA)” that is more often used within humanities research. 
This approach was chosen because instead of specifying three parameters, as is 
the case with LDA, only two needed to be specified (the number of topics and 
the number of words in a list). Furthermore, when applying LDA the words per 
topic will change every time the program runs over a set of documents. With 
NMF this is not the case and thus provides better topic stability from a historical 
point of view, making my (re)search more consistent. 


Digital Cultural Heritage Collections — Scoping the Future of HistoGraph,” in Engineering the 
Web in the Big Data Era, ed. Philipp Cimiano et al. (Cham: Springer International Publishing, 
2015), 613-6. 

36 Daniel D. Lee and H. Sebastian Seung, “Learning the Parts of Objects by Non-Negative Ma- 
trix Factorization,” Nature 401 (October 1999): 788-91. 

37 Topic modeling (TM) was made popular in machine learning by Blei et al. via the use of 
LDA, one of the many different approaches to TM. Blei and others have published frequently 
on the use of TM and LDA. David M. Blei, Andrew Y. Ng, and Michael I. Jordan, “Latent Dirich- 
let Allocation,” Journal of Machine Learning Research 3 (2003): 993-1022; David M. Blei, “Prob- 
abilistic Topic Models,” Communications of the ACM 55, no. A (2012): 77-84; and David M. Blei, 
“Topic Modeling and Digital Humanities,” Journal of Digital Humanities 2, no. 1 (2012). 
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To study psychiatric phenomena across a wide timespan my colleagues 
generated two kinds of topic: “window topics” — the standard calculation of X 
topics (where X = 10 to 20 in my case) for each year of a corpus (Fig. 3) - and 
“dynamic topics” - less standardized, but allowing better spotting of the devel- 
opment of psychiatric topics and their (vocabulary) variations through time.*® 
A good example is the development of general paralysis (GP) as a psychiatric dis- 
ease in the nineteenth century and its connection with syphilis, as well as with 
the technical developments that took place within medicine. GP was a mental 
disease in which patients slowly lost all mental and motor functions — including 
total loss of speech, writing abilities, and movement - often paired with halluci- 
nations and dementia. No cure existed throughout the nineteenth century and 
for some time into the early twentieth century, until penicillin was discovered 
and mass produced from the 1940s. During the time period under consideration, 
a discussion surfaced regarding syphilis as a possible cause of GP. 


| Rank | 1876_01 | 1876_02 | 1876_03 | 
t-----— +---------------- +------------- +---------------- + 
| 1 | asylum | case | morphia | 
| 2 | insane | brain | dose | 
| 3 | patient | left | mania | 
| 4 | medical | cell | injection | 
| 5 | report | side | sleep | 
| 6 | hospital | nerve | excitement | 
| 7 | case | vessel | sedative | 
| 8 | lunatic | paralysis | vomit | 
| 9 | attendant | frontal | patient | 
| 10 | superintendent | centre | hypodermic | 
| 11 | association | general | case | 
| 12 | number | part | hour | 
| 13 | treatment | muscle | night I 
| 14 | appoint | motor | acute | 
| 15 | class | disease | effect | 
| 16 | officer | convolution | subcutaneously | 
| 17 | county | blood | action ! 
| 18 | committee | layer | drug [ 
| 19 | bethlem | eye | hypnotic | 
| 20 | great | corpus | administer | 
H---- po - +------- ee d 


Fig. 3: Window topic word assignment. This figure displays three topics that are present in the 
British Journal of Psychiatry, 1876. Via the words associated with each topic, a tag could 
easily be assigned for each subject: the first topic is about asylum management, the second 
is broadly related to neurology, and the third is about drugs. 2019. © Eva Andersen. 


38 These dynamic topics are no longer based on the original page content but on the window 
topics that were created earlier. Derek Greene, “Derekgreene/Dynamic-Nmf,” September 16, 
2020, accessed June 23, 2021, https://github.com/derekgreene/dynamic-nmf. 
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When reviewing the keywords in the dynamic topic, a couple of interesting 
points can be observed. First of all, the British corpus I used begins in the 1850s, 
while the dynamic topic indicates that the word “syphilis” in connection with GP 
only appears from the 1880s onwards. This tells us at a glance when the connec- 
tion between GP and syphilis became more central. Secondly, the vocabulary used 
became more technical around the turn of the century. This is for example notice- 
able in the use of the words “spinal” and “wassermann.” Both these terms refer 
to August Paul von Wassermann and his Wassermann test, which was developed 
to discover the presence of syphilis via the extraction of blood and/or spinal fluid. 


5.2.2 Multiple ways of searching and exploring via an interface 


While I found this raw topic modeling output understandable and usable, it 
also required my repeatedly switching between the given output and the digi- 
tized sources in order to read the content of the psychiatric journals. As with 
AntConc, this slowed the exploration process down considerably. This was im- 
proved by importing the topic modeling pipeline into histograph, thus creating 
a direct link between the topic modeling output, the digitized sources, and the 
textual transcription of the sources. 

Making use of histograph enabled me to integrate multiple ways of searching — 
a necessary search strategy to explore the corpus to its fullest. As Coles et al. have 
said, “[. . .] distant reading visualizations cannot replace close reading, but they 
can direct the reader to sections that may deserve further investigation.”*”? One of 
the many advantages that these search tactics brought to the fore was that this 
kind of tool can be a valuable addition to the use of more conventional methods 
(such as finding information only via tables of contents or indexes). Below I high- 
light some of the different search mechanisms that I used. 

A first way to explore the corpus was via the visualization of the generated 
topics (Fig. 4). Based on the tone and size of the dots shown in the visualization I 
could observe how often a certain topic appeared over time. This was especially 
useful in searching for and selecting subjects (such as general paralysis) that 
could function as case studies for my PhD dissertation. In addition, there is the 
possibility to zoom in and out of this dot-visualization in order to view the pages 
related to a specifically selected year. 


39 After Coles et al. in: Stefan Janicke et al., “On Close and Distant Reading in Digital Humani- 
ties: A Survey and Future Challenges,” in Eurographics Conference on Visualization (EuroVis) - 
STARs, ed. Rita Borgo, Fabio Ganovelli, and Ivan Viola (The Eurographics Association, 2015), 9. 
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Using these search approaches within histograph helped me extract rele- 
vant content, and especially to discover otherwise hidden content and “weak 
signals” around very distinct topics, which would not have been made visible 
by a manual search approach. Sometimes there were considerable discrepan- 
cies between what could be found within a table of contents and the relevant 
locations suggested by the system. Furthermore, histograph allowed me to find 
the proverbial “needle in a haystack” while I investigated a particular theme. 

To come back to the example of GP: within historical research there has been 
quite some emphasis on physicians, alienists, and syphilographers developing a 
cure for this disease. However, their interest in GP entailed more than a race fora 
cure. Their research would take many directions and become quite diverse. Medi- 
cal practitioners, for example, conducted research on the sense of smell in GP 
patients, the presence of peptone in urine and changes in body temperature - all 
of which were examined as possible indicators of the disease.“ Although these 
instances are rather infrequent, they do help paint a broader picture of physi- 
cians’ and alienists’ interest in GP. Without the use of digital search, and more 
specifically the use of the above-described search tactics in histograph, these in- 
stances would have been almost impossible to trace. 

A second way of searching within histograph was configured by clicking on 
a specific topic, for which all related pages were then displayed. Via additional 
keyword search these results could then be fine-tuned to my specific interests. 
Aside from a connection between GP and syphilis, many other potential causal 
links to GP were generated. To get a better grasp on these I was able to, for exam- 
ple, study GP and its relation to alcoholism. In this case, histograph displayed 
only those pages on which the words “general paralysis” and “alcoholism” ap- 
pear together within the selected GP topic (Fig. 5). This facilitated the search for 
only relevant content within the psychiatric journals, reducing the number of 
pages that I needed to close read. 

A third method of searching involved using “keyword mentions.” One or 
multiple keywords were specified by me and were displayed via a bar chart at 
the same time, also making it possible to directly access the pages with these 
particular words. This proved to be a useful search tactic to study for example 
the presence of specific psychiatrists. One of them was the internationally re- 
nowned Belgian alienist Jules Morel. By implementing keywords with different 
variations of the spelling of his name (jules morel, jul. morel and j. morel) a 


40 Jules Morel, “Un Nouveau Signe Diagnostiqué de la Paralysie Générale Progressive, par le 
Dr Marro,” Bulletin de la Société de Médecine Mentale, no. 48 (1888): 196, http://hdl.handle. 
net/2027/mdp.39015070250769; and, “De la Température dans la Paralysie Générale, par 
F. Peterson,” Bulletin de la Société de Médecine Mentale, no. 71 (1893): 468. 
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straightforward overview of his name’s occurrence within the British psychiat- 
ric journal was generated, which created a basis for more in-depth exploration. 

Via the use of these different layers and its accompanying search tactics, I 
had the opportunity to be more precise, as well as flexible about what I wanted 
to investigate. During my research process I have used these layers for different 
purposes, ranging from the discovery of relevant study subjects via topic model- 
ing, to the discovery and fine-tuning of my already-selected case studies via “key- 
word mentions.” These examples are of course also traceable in a similar fashion 
through tools such as AntConc or Voyant Tools which, to a certain extent, use 
similar mechanisms. However, with histograph, due to the incorporation of dif- 
ferent search strategies as well as its more “natural” visualization of the sources, 
the research process was improved and accelerated. 


6 Conclusion: Digital search as an extension 
of the historian’s workflow 


One of the first tasks in the historian’s research workflow is “search.” This is 
where all historical research begins and it is one of many skills that historians 
are proficient in. However, within the scope of my PhD project researching the 
dissemination of psychiatric knowledge across the nineteenth and early twenti- 
eth centuries, it became apparent that using only an analog search approach, 
either for finding my sources (psychiatric journals) or gathering all relevant 
text fragments within my sources, would not be sufficient as a tactic. This is 
where digital search became a central aspect of my research workflow. 

An implied question that runs throughout this chapter is in how far the his- 
torian’s skill in analog or traditional search is (dis)similar to that of digital 
search, and whether the latter undermines the former. This cannot be answered 
with a simple yes or no answer. Firstly, in essence, analog search and digital 
search are not that different: their common factor being keywords. Further- 
more, analog search often remains present — whether consciously or uncon- 
sciously — within the boundaries of digital search. With digital search we are 
directed to potentially relevant sections within a source. But as historians we 
will always investigate these specific pages in more detail. It is within this pro- 
cess of close reading specific sections that we (un)consciously apply traditional 
search. If we, for example, were being directed to a section about non-restraint 
via a digital keywords search, our eye might be caught (just as in analog 
search) by certain other words or phrases that may be relevant and which could 
help to fine-tune our search tactics. 
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However, this does not mean that we do not need to be aware of some as- 
pects that make digital search disparate from analog search. This awareness 
starts with the realization that the ways in which one can digitally search (includ- 
ing keyword search, fuzzy search, topic modeling, time range selection, etc.) are 
far more extensive than when we talk about analog search. In addition, not every 
project or research question benefits from the same search tactics — and refining 
our search approach is a process that often involves trial and error. 

Secondly, digitization, including the multiple functionalities of the search 
box, “[. . .] opens shortcuts that enable ignorance as well as knowledge.”*' We 
need to be aware of the pitfalls that can await the historian when applying digital 
search. The impact on search possibilities and strategies can be quite tremendous 
and starts with the digitization of sources. From the scanning machines and scan 
settings used, through the choice of OCR software and its accuracy, to the sour- 
Ces document format — all have an impact on which search tools the historian 
can ultimately use. 

Thirdly, after locating our digital sources and deciding on our search tools, 
there comes the problem of developing one or multiple search tactics. Because 
the information that the historian is looking for is often complex it is better to 
make use of multiple search tactics. Diverse search functionalities can help us 
reassess our existing knowledge of particular topoi within history more easily, 
and can also lead us to discover new or forgotten subjects of interest. Of course, 
each search function comes with its own opportunities and drawbacks. How- 
ever, I think that as long as we try to fully understand these functions and be 
transparent about how we use them - not forgetting to combine distant and 
close reading — using multiple approaches to digital search can contribute to 
many realms of historical research, including my own fields of transnational 
and psychiatric history. 

The question now is whether the many technologies available to assist us 
in searching and gathering information can also enable us to absorb informa- 
tion faster (e.g. speeding up information processing).“” The effort needed to in- 
terpret and close read texts on past events takes time. While the action of 
interpreting has not sped up as rapidly as technological innovations — we are 
just human after all - I do think that using digital search tools can speed up cer- 
tain parts of the search process, as well as the further exploration and analysis of 


41 Putnam, “The Transnational and the Text-Searchable,” 379. 

42 Lara Putnam, “Daily Life and Digital Reach: Place-Based Research and History’s Transna- 
tional Turn,” in Theorizing Fieldwork in the Humanities: Methods, Reflections, and Approaches 
to the Global South, ed. Shalini Puri and Debra A. Castillo (New York: Palgrave Macmillan, 
2016), 174. 
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sources. This is especially true when either looking for specific information in a 
large volume of data (that elusive needle in a haystack) or wanting to investigate 
large amounts of data over multiple years and corpora. 

With this in mind, I also want to briefly highlight a couple of interesting 
avenues that still need to be explored in relation to overcoming some of the lim- 
itations that digital search currently experiences. Further research efforts need 
to be invested in topic modeling across languages, making dynamic and cross- 
lingual explorations possible. When it comes to transnational knowledge ex- 
change, cross-lingual exploration might be one of the most significant ap- 
proaches that could help researchers discover patterns of exchange over a wide 
geographical region. Another area that merits further exploration relates to as- 
pects such as the expansion of keyword lists, the use of word embeddings for 
historical corpora and the use of word co-occurrences — since a researcher 
never can be aware of all historical variations of certain terms or all their 
misspellings. 

One of the key reasons that we need to continue developing digital search 
techniques and interfaces is precisely because search is such an essential ele- 
ment within a historian’s research practices. Digital search stands closer to, 
and is more of a continuity of, analog historical scholarship than many often 
think. Historians do not need to give up on their ways of practicing history via 
close reading - rather, the option of digital search can function as an extension 
of already-existing practices. 
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Sam Mersch 
The hybridity of living sources 


Hermeneutics and source criticism in modern place name studies 


1 Introduction 


Learning to know the neighbourhood requires the identification of significant localities, 
such as street corners and architectural landmarks, within the neighbourhood space. Ob- 
jects and places are centers of value.’ 


Space is a central element in human cognition as it envelops all our being, actions, 
and conceptions. From real and lived space to invented and metaphorical space, it 
is a constant in human cognition.” As space constantly surrounds any human civi- 
lization, its perception and classification are key elements of navigating that 
space, either in reality, or cognitively.” As language in its broadest sense needs a 
certain consensus of common knowledge and reference, so too does language that 
references space.” This can either occur through a lengthy and hard-to-follow de- 
scription by an interlocutor or, as is much more common for a space coinhabited 
by a community of people, by place names, also called toponyms. The notion of 
place name conveys exactly the specifics of the object or area being referred to, i.e. 
names for specific places" Places that can be referred to in names include 


1 Yi-fu Tuan, Space and Place. The Perspective of Experience (Minneapolis: University of Min- 
nesota Press, 1977), 17-8. 

2 William Ittelson et al., An Introduction to Environmental Psychology (New York: Holt, Rine- 
hart and Winston, 1974), 98-100; Michael Maurer, Kulturgeschichte (Cologne: Böhlau, 2008), 
177-83; and Kenny Coventry, “Space,” in Cognitive Linguistics. Key Topics, ed. Ewa Dabrowska 
and Dagmar Divjak (Berlin: De Gruyter, 2019), 44-51. 

3 Martin Thiering, Kognitive Semantik und Kognitive Anthropologie (Berlin: De Gruyter, 2018), 
93-101. 

4 Maurer, Kulturgeschichte, 69-72; Bryan Lawson, The Language of Space (Oxford: Architec- 
tural Press, 2005), 15-38; Joel Kameron, “Experimental Studies of Environment Perception,” in 
Environment and Cognition, ed. William Ittelson (New York: Seminar Press, 1973), 165-7; Cov- 
entry, “Space,” 44-5; and Thiering, Kognitive Semantik, 156-60, 208-10. 

5 There is an ongoing discussion of the specificities and demarcation of place names, be they 
settlement names, rural names, or others, which would exceed the limits of what would be 


Acknowledgments: | would like to thank Wolfgang Dahmen and Serge Noiret for their insight- 
ful feedback on earlier drafts of this paper, and the Luxembourg National Research Fund (FNR) 
(10929115), who funded my research. 


3 Open Access. © 2022 Sam Mersch, published by De Gruyter. [G] BAAT] This work is licensed under 
the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. 
https://doi.org/10.1515/9783110723991-008 


160 —— Sam Mersch 


geomorphological elevations, specific houses, trees, or agricultural areas, and 
even crossroads and former settlements. In this chapter I analyze those names 
as a source in general, without focusing on specific names, apart from a cou- 
ple of illustrative examples. 

The base data are those that I amassed for my PhD research on the topono- 
mastic landscape of Luxembourg and its potential for compiling a linguistic 
history of the Luxembourgish language. In place name data, the focus ison 
names of unsettled (or formerly settled) places — so-called anoikonyms or rural 
names.° However, for simplicity, the term place name, specifically denoting 
such rural names, will be used in this chapter. 

In place name studies, the term hybridity is usually used when referring to 
names whose morphological structure exhibits elements that can be traced to 
multiple (and different) language varieties.’ However, this phenomenon is rather 
specific to settlement names (so-called oikonyms or names of currently settled 
places), rather than rural names. Furthermore, it represents only a very small as- 
pect of the hybrid nature of place names. This chapter does not dwell on this 
general notion of hybrid names, nor does it really focus on hybrid practices or 
the need for them in general. The focus lies rather on the hybridity of the source, 
with a slight nod to such hybrid practices in digital humanities research, while 


pragmatic and productive for this chapter. Hence the discussion can be followed via, for exam- 
ple, Rob Rentenaar, “Mikrotoponymie aus nordwestgermanischer Sicht. Einige Bemerkungen 
zur Definition und Terminologie,” in Mikrotoponyme. Jenaer Symposion, 1. und 2. Oktober 
2009, ed. Eckhard Meineke and Heinrich Tiefenbach (Heidelberg: Universitätsverlag Winter, 
2011), 197-205; Teodolius Witkowski, “Probleme der Terminologie,” in Namenforschung. Ein 
internationales Handbuch zur Onomastik, ed. Ernst Eichler et al., vol. 1 (Berlin: Mouton de 
Gruyter, 1995), 288-294; and Erika Windberger-Heidenkummer, Mikrotoponyme im sozialen 
und kommunikativen Kontext. Flurnamen im Gerichtsbezirk Neumarkt in der Steiermark (Frank- 
furt am Main: Peter Lang, 2001), 102-5. 

6 See Friedhelm Debus, Namenkunde und Namengeschichte. Eine Einfiihrung (Berlin: Erich 
Schmidt Verlag, 2012), 138; and Julia Kuhn, “Rural Names,” in The Oxford Handbook of Names 
and Naming, ed. Carole Hough (Oxford: Oxford University Press, 2018), 135-6. 

7 See Gillian Fellows-Jensen, “Names and History,” in The Oxford Handbook of Names and 
Naming, ed. Carole Hough (Oxford: Oxford University Press, 2018), 516-7, 523; and Berit 
Sandnes, “Names and Language Contact,” in The Oxford Handbook of Names and Naming, ed. 
Carole Hough (Oxford: Oxford University Press, 2018), 542, 548-9. 

8 For the notion of hybrid practices and the hybrid nature of digital humanities on the exam- 
ples of digital history see Andreas Fickers, “Hermeneutics of in-betweenness: digital public 
history as hybrid practice,” in Handbook of Digital Public History, ed. Serge Noiret, Mark Te- 
beau, and Gerben Zaagsma (Berlin: De Gruyter, 2022 (forthcoming)) graciously provided by the 
author as a preprint; and Gerben Zaagsma, “On Digital History,” bmgn — Low Countries Histori- 
cal Review 128, no. 4 (2013): 3-29. 
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concentrating on the similarities of the analog and the digital when concerned 
with place name studies. 

I analyze the types of source that exhibit such place name data on their her- 
meneutic potential, primarily by an external source criticism that balances the 
hybridity of the source itself and its provenance or textualization. I compare ana- 
log and digital processes to get insights into common problems that both exhibit, 
and some of the advantages of one over the other, and take internal source criti- 
cism into account when demonstrating specific provenance issues of the sources. 
However, information structure hermeneutics and linguistic hermeneutics are 
omitted, as are hermeneutics of another nature. 


2 The hybridity of the source 
2.1 The place name as a source 


Rural names as place names do not exist as singular instances. Although every 
place name is a unique linguistic identifier of a given place, that name only ex- 
ists in the naming system which is used to reference all relevant space for cul- 
tural interaction.’ The scope of influence of the naming systems can vary but, 
typically, it is more or less limited to the speaker or user culture that uses the 
microspace that is named: the latter is roughly equivalent to the people living 
in a nearby settlement.' In the past, when life was rather more bound to local- 
ity, place names were more actively in use by the user culture(s), but had less 
widespread use. 

Places can be ranked according to their respective informative values in 
space, and these rankings also relate directly to the place names and how they 
are used.” The further away a place is that is still referred to in a speaker culture, 
the more important the place and its name are, in general. This also means that 


9 Peter Anreiter, Zur Methodik der Namendeutung. Mit Beispielen aus dem Tiroler Raum (Inns- 
bruck: Verlag des Instituts fiir Sprachwissenschaft der Universitat Innsbruck, 1997), 145; David 
Mills, Oxford Dictionary of British Place Names (Oxford: Oxford University Press, 2003), xi; and 
Vincent Blanar, Theorie des Eigennamens (Hildesheim: Olms, 2001), 60-2. 

10 Erika Windberger-Heidenkummer, “Kontinuität und Diskontinuität von Flurnamen. Prob- 
leme und Beispiele,” in Mikrotoponyme. Jenaer Symposion, 1. und 2. Oktober 2009, ed. Eckhard 
Meineke and Heinrich Tiefenbach (Heidelberg: Universitätsverlag Winter, 2011), 290-1. 

11 See Maurer, Kulturgeschichte, 165; and Willy Van Langendonck and Mark Van de Velde, 
“Names and Grammar,” in The Oxford Handbook of Names and Naming, ed. Carole Hough (Ox- 
ford: Oxford University Press, 2018), 33-4. 
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individual very important places and their names have a much wider audience. 
In fact, the closer but less important the place (in a global sense), the smaller the 
influence on speakers in general. Rural names as place names occupy the least 
influential slot in this hierarchy, as they are closely bound to very small local 
spaces, important only to a settlement area and its inhabitants. 

These place names have always been used for local narratives, for location 
in space, when referring to the known space of a settlement.” Hence, they are 
often used for some sort of legal demarcation of the boundaries of larger areas 
such as settlements, communes, or rural districts.”” Even though speakers of a 
language can shape names in a similar fashion, resulting in some onomastic 
overlap in different geographical areas, the use of a place name while referring 
to a specific place is always unique. 

Rural names are essentially a very informal and oral onomastic category — 
and they derive their usability through multigenerational tradition. When a 
name is handed down from generation to generation the human-nature rela- 
tionship is expressed by the name allotted to a place.” This can either be on 
perceptional grounds, how the community sees and interprets a landscape, or 
on interactional grounds, how the community has manipulated a landscape in 
order to make it more profitable to them. Since such uses and perceptions 
mainly remained unchanged, the names often did, too.” It is only when a place 
does not fulfill its cultural role any more that it is abandoned — however, its 


12 For the notion of place names as narratives see Thiering, Kognitive Semantik, 321-2; Terhi 
Ainiala, “Identifying Places and Discussing Names: the Use of Toponyms in a conversation,” 
in Challenges in Synchronic Toponymy. Defis de la toponymie synchronique: Structure, Context 
and Use. Structures, contextes et usages, ed. Jonas Löfström and Betina Schnabel-Le Corre (Tü- 
bingen: Francke, 2015), 33-46; and Giovanni Agresti and Silvia Pallini, “Vers une toponymie 
narrative. Récits auto-biographiques et ancrages géographiques dans deux villages de la 
Haute Vallée du Vomano (Italie),” in Challenges in Synchronic Toponymy. Défis de la toponymie 
synchronique: Structure, Context and Use. Structures, contextes et usages, ed. Jonas Löfström 
and Betina Schnabel-Le Corre (Tiibingen: Francke, 2015), 21-32. 

13 As can be seen from multiple narratives in Luxembourgish deeds, such as the Weisthum 
von Besch from 1541 or the Weisthum von Beaufort from 1557. See Mathias Hardt, ed., Luxem- 
burger Weisthiimer. Als Nachlese zu Jacob Grimm’s Weisthiimern (Luxembourg: Biick, 1868), 
62-5 and 91-100. 

14 Maurer, Kulturgeschichte, 174-5; Sandnes, “Names and Language Contact,” 549; see also 
Ellen Bramwell, “Personal Names and Anthropology,” in The Oxford Handbook of Names and 
Naming, ed. Carole Hough (Oxford: Oxford University Press, 2018), 275-6; and Alison Grant, 
“Names and Lexicography,” in The Oxford Handbook of Names and Naming, ed. Carole Hough 
(Oxford: Oxford University Press, 2018), 575. 

15 Blanär, Theorie, 20-3. 
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name, having become part of local tradition, often remains. Hence, such names 
can still be used to refer to space as a human cultural and cognitive expression. 

When a place name no longer reflects its initial setting — when the human 
relationship with the landscape described is abandoned - the name becomes a 
source for historical information. Thanks to their close link to the land, place 
names can offer us a multitude of indications of different strata of information 
on how human culture has used and shaped a landscape. These strata include 
environmental interference (such as the creation of agriculturally fertile areas 
of land through draining wetlands or as in the medieval deforestation), as 
much as information on agriculture (what crops were grown where), and also 
on other areas such as legal history (when names refer to legal agricultural dis- 
tricts or frontiers), or language history (offering insight into everyday language 
beyond evidence from the administrative jargon in historic deeds). 


2.2 The living and the petrified 


The documentation of place names in general, and in Luxembourg particularly, 
is quite varied. As these names refer to specific places, they can be mentioned in 
early property deeds when land plots are concerned. The names are specifically 
mentioned in such cases because they were legally binding. It was only the place 
names that made it possible to refer to and identify a place in such documents. 
Even though the names are no longer legally binding they are still mentioned as 
a reference point, through tradition. Some of these mentions date back over a 
millennium, such as that for heliberc (meaning “healing mountain”) in 902,16 
this being modern-day Helperknapp, located just west of the geographical center 
of the modern Grand Duchy. It is named as such due to the still recognized leg- 
end of a healing well at the top of a hill, that mentions the visits of Charle- 
magne,” and Willibrordus, the Northumbrian missionary saint? — but has a 


16 Maurits Gysseling, Tom Jozef de Herdt, and Jozef van Loon, Toponymisch Woor- denboek 
van België, Nederland, Luxemburg, Noord-Frankrijk en West-Duitsland, (vóór 1226) (Brussels: 
Belgisch interuniversitair centrum voor Neerlandistiek, 1960), 471. 

17 Nicolas Gredt, Sagenschatz des Luxemburger Landes. Vollständig durchgesehene und überar- 
beitete Neuausgabe unter Einbindung des Registerbandes von A. Jacoby, J. Dumont, L. Senninger 
und H. Rinnen erstellten Registers (Luxembourg: Institut Grand-Ducal - Section de Linguisti- 
que, d’Ethnologie et d’Onomastique, 2005), 66. 

18 See Hartmann Melzer and Otto Wimmer, Lexikon der Namen und Heiligen (Hamburg: Nikol, 
2002), 857-8. 
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much older history. The narrative regarding Willibrordus is still recognized 
each Pentecost with a yearly procession to the hilltop. The oral name tradition is 
still very active, with many place names having been spawned with the linguistic 
motif of that hill — all relaying the idea of a procession to the mountain. But only 
the hill itself has been mentioned in historic documents and, apart from two 
tenth century mentions, only from the fifteenth century onward.”° 

Most place names, however, only experienced solely oral transmission 
until the nineteenth century, and were only written down when the first land 
registries emerged.” This cataloging was begun in Luxembourg in 1795, by the 
Département des foréts, when Luxembourg was still under French occupation, 
and was finished in the second half of the nineteenth century.” This consti- 
tuted the first emergence of a quantitatively relevant collection of place names 
for Luxembourg and later became the basis for the modern place name data- 
base of the Administration du cadastre et de la topographie (the Luxembourg 
land registry or cadastral office). The names became binding once they were 
written down, which means that in any official documentation the name would 
emerge in the exact form of the land registry entry. This of course creates a few 
specific problems, the first being the way these names were collected. 

When a name is written down in a distorted form and then becomes official, 
it does not reflect the name and language use at the given place. Technically, 
this is not an issue when the “unofficial” names are in frequent use, and an offi- 
cial name and its dynamic form can coexist side by side. However, after World 
War II, with less of the population being employed in the agrarian sector, much 
of the local landscape knowledge, including place name knowledge, was lost.” 
It was the knowledge about the true local name as it was spoken that was lost, 


19 André Schoellen, “Zeugenberg Helperknapp. Neue archäologische Erkenntisse zu dieser 
herausragenden Fundstelle,” Nos Cahiers 34, no. 3 (2013): 207-21. 

20 Denise Besch, Vu villa bis Weiler, vu fréier bis haut: Suffixe der Luxemburger Ortsnamen 
(Luxembourg: Institut Grand-Ducal — Section de Linguistique, d’Ethnologie et d’Onomastique, 
2018), 248; and Joseph Meyers, Studien zur Siedlungsgeschichte Luxemburgs, 3" ed. (Luxem- 
bourg: Krippler et cie, 1976), 108. 

21 Debus, Namenkunde, 141. 

22 Alphonse Eyschen, “Das luxemburgische Kataster. Sein Ursprung, seine Entwicklung bis 
zum heutigen Tag. Vergleich mit den Katasern der beiden Beneluxpartner,” Tijdschrift voor 
Kadaster en Landmeetkunde 74, no. 3 (1958): 151-7; and Administration du cadastre et de la 
topographie, Dates de l’achevement des plans-minutes, copy of a typewritten notice summariz- 
ing the dates of the establishment of the first land registries, Administration du cadastre et de 
la topographie, Luxembourg. 

23 See also Damaris Nübling, Fabian Fahlbusch, and Rita Heuser, Namen. Eine Einführung in 
die Onomastik (Tübingen: Narr, 2015), 239. 
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together with its possible link to the landscape. The form that was written down 
persists, “petrified” in whichever way it was originally documented. Hence, the po- 
tentially distorted forms can also serve as a basis for place name knowledge for a 
population that does not possess this traditional knowledge any more, and thus 
the petrified names, written down for economic purposes, also function as a lin- 
guistic and cultural marker of place again. There is still a need to reference space 
in a cultural landscape. In Luxembourg, for example, many of these place names 
still fulfill their original purpose of referring to space — even though a lot of the 
knowledge was initially lost — with older place names used when naming bus and 
tram stops, and also serving in a name-giving function as new industrial and real 
estate landscapes are created. The petrified forms are thus revived and continue to 
function as a living source, even though this is a distorted image of the original. 

Place names constitute a living and constantly used source. The loss of oral 
tradition is only problematic when the written documentation distorts the 
name’s identity, with some misinterpretations of a name leading to a distortion 
of past cultural identity too. An example of this would be for the name origi- 
nally spelled Horekaul - literally a hollow, sometimes flooded, used in linen 
production. The first word is etymologically connected to the word “hair”, as in 
the threads woven into cloth. However, the agent writing down the name for 
the land survey seemingly did not know this specific cultural background and 
wrote the name down as Hurenkaul, literally interpreting it as “well of the 
whores.”~* Such clearly misinterpreted names, however, are not that common. 
More often, names have been handed down orally for such a long time before 
they were ever documented, that the original cultural knowledge reflected in a 
place name has been lost due to cultural changes. An example of this is the 
place name Verluerekascht, which occurs in many places, but is best known in 
Luxembourg City (in the Bonnevoie district). The name initially hints at the 
ruins of a derelict castle, its literal meaning being “lost castle.” But later on, 
when the link to the castle was no longer culturally relevant, it was reinter- 
preted as a place of “lost food,” which is also hinted at in the earliest land reg- 
istry documentation due to its German translation as Verloren Kost.” 


24 See Pierre Anen, Luxemburgs Flurnamen und Flurgeschichte (Luxembourg: Sankt Paulus 
Druckerei, 1945), 23. 

25 It is not clear if the reinterpretation had already occurred prior to the documentation in the 
first land registry or was a result of it. However, the wide dispersal of the name, as well as the 
regular development of the Latin castellum into the Luxembourgish Kascht, and the homophony 
with the Luxembourgish Kascht “food,” suggests that the reinterpretation occurred prior to the 
first documentation, as the error seems to be widespread in many German forms of the name. 
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Place names are difficult to put into historiographical source categories. The 
oral character of place names shows both of the characteristics of the traditional 
historiographic division into tradition and remnant.”° The oral knowledge of 
place names served the specific purpose of portraying space, and maintaining 
and sharing that knowledge within the community, thus making it clearly part of 
the tradition category, as it was an intentional means of conserving the knowl- 
edge, be it only in oral tradition.” However, when the name is preserved, but not 
the cultural link it initially portrayed, it is in fact a remnant, as the initial goal of 
creating the name was not to archive knowledge in case it got lost or forgotten.” 
The documentation of oral names, however, has the clear goal of preserving in- 
formation — making it part of the tradition branch of source categorization — 
even though the documentation was not originally intended to preserve the ini- 
tial knowledge, which was the reference in space itself. This reference was only 
documented in order to link to the cultural conception of place, so as to be able 
to link ownership with space and hence tax revenues. The documents then inad- 
vertently become remnants because they convey information about past cultural 
events, linguistic history, identity, and other cultural influences, without that 
having been intention. 

In the end, the only seriously detrimental outcome of the loss of the oral tradi- 
tion is when no documentation exists at all, resulting in absolute loss of knowledge, 
which of course cannot be documented or scientifically evaluated in any form. 


3 External source criticism of Luxembourgish 
place names 


3.1 Hybrid provenances 


The initially oral character of place names has already been mentioned above 
but, when it comes to provenance, there is not really much that can be said. As 
discussed, the names do not exist by themselves, but only in collections of all 
place names as a reference for the cognitive impression of space in a specific 
community. And as names themselves are a linguistic expression, they also 


26 As in the German coining of Tradition and Uberrest by Ahasver von Brandt, Werkzeug des 
Historikers. Eine Einfiihrung in die historischen Hilfswissenschaften (Stuttgart: Kohlhammer, 
1966), 58-75. 

27 Von Brandt, Werkzeug des Historikers, 71-4. 

28 Von Brandt, Werkzeug des Historikers, 66-71. 
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adhere to linguistic rules. While a person can try to coin a word or a name for a 
specific purpose, the usefulness of that linguistic instance is only proven if some- 
body else can interpret it and put it into linguistic relation. Language is a com- 
munication system and hence is the common reference of place names. Thus, it 
is not really possible to pinpoint a specific inventor or author of a name, as the 
existence of the name is strictly linked to its broad acceptance. Technically, that 
means that authorship of a name lies with the wider speaker (or user) culture, 
even when a single specific person invented it first. The name is only valid when 
it is accepted as a semiotic entity in multiuser communication. Given this situa- 
tion and the oral tradition, it is rarely possible to narrow the authorship down 
further than to a specific speaker group, possibly a dialectal subvariety. However, 
this narrowing down is already part of internal source criticism, as is the relative 
chronology of the names, which often has to be constructed via linguistic meth- 
odology, specifically the historical comparative method - thus it will not be dis- 
cussed further here.” 

Even though place names are essentially an oral (and living) source, they 
are mostly only tangible for science purposes when documented in writing. Re- 
tracing their provenance (and transmission history) can broadly be split into 
two categories: textual and digital. 

In place name research, textual provenances exhibit the same problems as 
any other material or textual sources. Apart from the first land registries, or pos- 
sibly an early rural map, older instances of place names can only be found in 
legal deeds. As such, the place name has to be considered using the same criteria 
of source criticism as the deed - i.e. What material was the deed written on? 
Who wrote it? For what purpose? and When? However, there is one key element 
that needs to be distinguished. Regarding the deed itself, the parties involved 
have rather to be looked upon within the scope of internal source criticism, i.e. 
by considering the textual evidence. For the name, that evidence would be the 
exact graphematic transcription of the name (an aspect not discussed in this 
chapter). However, there are a few other key data that need to be evaluated as 
part of the external source criticism. First comes the issue of where the places are 
to be located. Place names are used to refer to space, so the narrative of the 
space is very much part of the external criteria for the precise allocation of the 


29 I am preparing a full overview of hermeneutical aspects and source criticism concerning 
Luxembourgish place names, including internal source criticism, especially linguistic herme- 
neutics, as part of my PhD thesis. 
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names. A place name can only be truly analyzed if its whereabouts are known.” 
More important is the why. This is not the same as for the deed itself, as in why 
the contents of this document were written down, but rather why these names in 
particular were used. When a specific plot of land is mentioned by name as the 
object of a transaction, this makes answering that question relatively easy. 
When, however, place names are used to refer in space — meaning to give a path- 
way, as for example regarding where a legal frontier is to be located - the corre- 
spondence is different. Why were these names chosen and not others? Were they 
more important or better known? Were there only that many names or is this just 
a selection of a few? Who chose the names to be used as reference markers? The 
general scope of external source criticism is the same as for internal criticism, 
but the details differ. 

There is one general map for Luxembourg from the eighteenth century that 
covers all of the modern Grand Duchy’s rural terrain. It is the so-called Ferraris 
map that covered, among other areas, the old duchies of Luxembourg and Lim- 
burg.” It is the oldest map of Luxembourg that covers all rural areas and it also 
exhibits a few hundred place names. The who, what, and where are generally 
well known. It is a military map” covering the Austrian Netherlands, which was 
started circa 1770 and finished by 1778, under the supervision of Count Joseph de 


30 The occurrence of a place name in a deed does not guarantee its exact location, which is 
sometimes needed to discern the meaning of the name. When looking at the Weisthum von 
Beaufort (1557), we can identify a place name Weigerwiesz, which at first glance seems to de- 
note a possessive relation of a pasture plot — see Matias Hardt, ed. Luxemburger Weisthümer. 
Als Nachlese zu Jacob Grimm’s Weisthiimern, 64. The place cannot be located as such today. 
However, the modern land registry files offer a place called Weiherwies in the village of Beau- 
fort, which is located in the vicinity of a river. This leaves the conclusion that the “g” of the 
name from the deed was in fact not pronounced (or only very slightly) and that there is no 
personal name to be identified with it (and hence no possessive relation), but rather the lex- 
eme Weier, which denotes a pond. See Ministére de la Culture, ed., “LOD,” 2007, s.v. Weier, 
accessed August 31, 2020, http://lod.lu (hereafter cited as LOD). Hence, the identification of 
an exact place through the digitally available modern data helps in discerning the etymology 
of a place. See also geoportail.lu, “Zoom in on place name Weiherwies in Beaufort,” accessed 
August 31, 2020, https://map.geoportail.lu/theme/main?lang=fr&version=3&zo0om=18&X= 
698408&Y=6415761&layers=320&opacities=1&bgLayer=basemap_2015_global. 

31 For more information, see geopunt.be, “Metadata — Ferraris kaart - Kabinetskaart der Oos- 
tenrijkse Nederlanden en het Prinsbisdom Luik, 1771-1778,” accessed May 6, 2020, https:// 
metadata.geopunt.be/zoekdienst/apps/tabsearch/.?uuid=2d7382ea-d25c-4fe5-9196- 
b7ebf2dbe352. A digital edition of limited resolution can also be accessed - see Bibliothèque 
royale, Koninklijke Bibliotheek en. “Ferraris map,” accessed August 31, 2020, https://www. 
kbr.be/en/the-ferraris-map/. 

32 This is a map consisting of 275 separate sheets. 
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Ferraris.” It is important to note the fact that, as a military map, it was supposed 
to highlight the military potential of the landscape - and, as with all maps used 
in onomastic research, needs the right kind of scrutiny.”* The latter also reflects 
the choice of the place names that were selected for the map.” 

Most place names, however, are to be found in place name collections, in- 
cluding land surveys by cadastral offices and private collections. For the study 
of Luxembourgish toponymy, there are two major collections that exhibit differ- 
ent characteristics in how they were collected. 

The most important collection for Luxembourg is that of the national cadas- 
tral office. The way in which the names in this collection were documented has 
been described above. However, a key feature that needs to be highlighted here 
is that, in the case of the Luxembourg’s early land registries, both the names of 
the surveyors and the years of their surveys are known.*° The original (textual) 
land registries also sometimes give an indication of the people who provided 
information for specific surveys. What they do not record is the name of the per- 
sons collating the surveys over time; neither is there detailed documentation of 
copies or the variations in these copies, some of which may be errors, some per- 
haps legitimate changes. The most important hermeneutical insight regarding 
this source is linked with the digitization of the cadastral offices that occurred 
in Luxembourg in the 1980s.” Here the source underwent a technical transfor- 
mation of the media that it used. The handwritten registries still exist as archi- 
val material but the land registry database itself was converted into a digital 
format, which is mostly concerned with geographical and geomorphological 
data, but also records place names. As far as is known, however, there is no 


33 See Malte Helfer, “Ferraris-Karte (1771-1777),” 2008, accessed 6 May 2020, https://gr-atlas. 
uni.lu/index.php/de/articles/ge57/fe102. Little is known about the original surveyors who de- 
livered the handwritten surveys containing the geomorphological data for the map. 

34 Anreiter, Zur Methodik der Namendeutung. Mit Beispielen aus dem Tiroler Raum, 57. 

35 Most names refer to natural resources useful for a traveling battalion, such as forests. 
Others refer to favorable geomorphology, as in names of mountains to navigate or hide in, as 
well as hiding places like valleys that were easy to defend. A few also render the usefulness of 
the land, mentioning land plots used annually for specific staple crops. 

36 See Administration du cadastre et de la topographie, ed., Administration du cadastre et de 
la topographie, Grand-Duché de Luxembourg. Cinquantenaire 1945-1995, (Luxembourg: Service 
information et presse du gouvernement, 1996), 13-6; and Administration du cadastre et de la 
topographie, Dates. The names of the surveyors were always mentioned on the cadastral maps 
drawn up, most of which are accessible via geoportail.lu, “Urplang JPG 400,” accessed May 6, 
2020, https://map.geoportail.lu/urplang/JPG_400/. 

37 Administration du cadastre et de la topographie, Cinquantenaire, 61. 
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documentation on what transitions were made, except for the institution of the 
initial systems and some broad administrative choices.”® Nor is there informa- 
tion on the identity of the specific users who maintained and changed the data. 
As a land registry is a living source, with plots being changed from time to 
time, this also leads to the loss or creation of some names - but, as far as is 
known, there is no equivalent of the “Wayback Machine” that has tracked all the 
changes made within the different systems and software versions used since digi- 
tization.” No need seemed to have arisen to establish a detailed documentation 
of the technical exploits and renderings of the data, nor to publish it in an open 
setting, which is quite indicative, given that the cadastral office is the most pro- 
lific data contributor on Luxembourg’s national open data portal.“° A methodo- 
logical description of the technical aspects and changes seems either not to have 
been deemed necessary or, perhaps, feasible - with too many changes having 
occurred in the last four decades when it comes to technical innovation in com- 
puting. However, land registries are not an exception per se, as such transitions 
and subsequent changes have been successfully documented in many sectors, 
both public and private. 

Another very important collection of place names for Luxembourg is that of 
the Institut Grand-Ducal — Section de linguistique, d’onomastique et d’ethnologie, 
which offers initial oral fieldwork data, together with official land registry corre- 
spondences. The collection itself has mostly been lost, with just a few pages hav- 
ing been rediscovered in 2019. From these originals, it was possible to discern that 
the oral survey was undertaken via official channels, with local district commis- 
sioners gathering both cadastral and oral names from the mayors of all communes 
in the year 1935. There is no indication as to why and on whose orders this collec- 
tion took place, with the original directive being untraceable for now. A complete 
version of the collection does exist in the form of a copy though - this is the print- 
out of a database file created by a former member of the institute. The source data- 
base file has, unfortunately, been lost. The creator of the file, now a nonagenarian, 
has not been able to give a detailed account of its documentation and can only 
remember some of the processes involved — although he maintains that the print- 


38 Administration du cadastre et de la topographie, Cinquantenaire, 61-2. 

39 However, ArcGIS, one of the most widespread software implementations of geographical 
information systems, is (or at least was in 2018) apparently also used by the Luxembourg ca- 
dastral office. 

40 See Data.public.lu. Luxembourg data platform, “Data Sets.” By the Government of the 
Grand Duchy of Luxembourg, accessed May 6, 2020, https://data.public.lu/fr/datasets/. 
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out was not generated by him and exhibits some false data entry.“ Upon compari- 
son to certain originals, it can be established that about a third of the printout con- 
tains duplicate entries that are not found in the original manuscripts. Although the 
database creator asserts that he transmitted every place name instance into the da- 
tabase, after close scrutiny of the remaining originals, it can be estimated that 
about 5% of the original entries were not transferred. This data source lacks com- 
plete documentation with many discrepancies of provenance, textual changes, 
and redundant entries and the loss of most of the original data. Even though the 
data represented in this source are highly interesting — and though they represent 
the only dataset that contains a quantitatively relevant amount of place names in 
local vernacular — the source still needs to be closely scrutinized due to its prob- 
lematic provenance. 


3.2 Misspelled and wrongly encoded 


Despite the hybrid provenance of Luxembourgish place name sources, there 
are still quite a few areas that are common to both analog and digital sources, 
specifically regarding textual mistakes occurring due to external factors. 

When a scribe misspelled a name in historic deeds, the information that 
was supposed to be contained in the document was accidently distorted.“ 
When that name was then copied in this distorted form, the misspelling became 
tradition and changed the perception and reference of that name.*? When there 
is no oral evidence to veto this minute distortion it becomes strengthened and 
generalized, and the identity of the place name is permanently changed. 

The same is the case with non-analog approaches. As seen above, the ca- 
dastral office in Luxembourg switched to digital systems in the 1980s. This 
meant that all place names were digitized by typing them in manually. Human 
error always occurs, whether data is handwritten or typed into a computer, but 
the potential problem lies in the level of trust placed in the veracity of the 


41 It is unclear what implications this assertion has for the veracity of the data and its prove- 
nance, as the creator did not specify further even after repeated enquiries. 

42 Karin Schneider, Paläographie und Handschriftenkunde fiir Germanisten. Eine Einführung 
(Berlin: de Gruyter, 2014), 149-51. 

43 A good example is found in the official, non-Luxembourgish forms for the name Luxem- 
bourg, where the “x” is actually the result of a phonemic misinterpretation from the seven- 
teenth century, and it should be read as an /s/ not /ks/, as in the name Brussels, French 
Bruxelles. See Christian Kollmann, “Woher kommt das x in Luxemburg?,” Beiträge zur Name- 
nforschung 46, no. 2 (2011): 165-210. 
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computer data. An important issue here relates to the boundaries of text format- 
ting at the time. The first Unicode character chart was only devised in the late 
1980s, while the earlier American Standard Code for Information Interchange 
(ASCII) was originally developed in the 1960s and was widely dispersed even in 
countries that used a very different writing system or typography, than did 
American English. For the study of Luxembourgish place names specifically, 
this means that certain characters in the Luxembourgish alphabet could not be 
displayed, out of sheer technical impossibility. This has a lot of repercussions, 
as for example with the lack of diacritics or any non-ASCII characters. Taking 
the Luxembourgish word for forest, i.e. Bésch, this was often transcribed in the 
first land registry as Biisch. When the land registry was digitized, something had 
to be done about the diacritic shown as <i. So, on occasion, forest was spelled 
Buesch — the «ue being a common way to represent the vowel «ü» in German“* — 
or the graphematic difference was simply omitted, making it Busch. This may not 
seem hugely significant but, from a linguistic standpoint, the origin of the form 
changes depending on whether the name is written with a «ü/ue or a «w. The ef- 
fect was that the name became distorted. In many cases, the distortion was kept, 
making it the official form of that name.“° 

A similar issue can be seen in the copy of the Institut Grand-Ducal’s collec- 
tion. The problems of this source’s provenance have already been discussed, 
but less so the presentation of the names in this copy. The names are shown 
here in capital letters only, except for the diacritic forms. The full capitalization 
is of course already a form of distortion, not relaying the true graphematic 
image of the original — but it is one that can generally be ignored, as it does not 
produce distortions per se. However, in some cases, diacritics from the originals 
seem to have been ignored and not rendered in any form, resulting in the same 
distortion as discussed for the cadastral data. It has to be kept in mind that this 
file was created in 1990 (at least according to the title allotted to the collection) 
and that Unicode compatibility was not yet widespread then. The author of the 
lost digital file that preceded the printed collection was able to code some dia- 
critic characters which, incidentally, were never capitalized. The diacritics do 
not seem to mirror the originals though, as can be seen from the few remnants 
that still exist in the archive of the Institut Grand-Ducal. The author had there- 
fore used an unknown, undocumented encoding system, while ignoring some 


44 See Schneider, Paläographie, 94-5. 

45 This discrepancy could only be revealed due to an official data file, established separately 
by former cadastral officials, now residing at the cadastral office and generously made avail- 
able to me. 
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graphematic features and changing others. These discrepancies can only be 
identified where there are corresponding originals in existence. 

The analog and the digital are also comparable as regards mistakes made 
by interpretation. A scribe might have transcribed a name accurately, but the 
reader or copyist might have misinterpreted a sign or character and copied it 
wrongly. This can, of course, also occur when copying from analog to digital but, 
in the end, it constitutes the same kind of human error. However, it is different 
when encoding systems come into play. As has already been hinted at, such sys- 
tems changed during the advancement of computing. The transfer from one such 
system to another could result in misalignments, creating different forms, as can 
be seen in the place name GonneschwAnkel, for example.“° The character <A> does 
not exist in Luxembourgish or any language varieties in its vicinity and is sup- 
posed to mark a diacriticized e, possibly <é». Misinterpretations can thus occur 
due to human error or, in this case, computer error. 


3.3 Analog vs. digital — the lack of documentation 


Lack of documentation is a serious issue for both analog and digital sources, as 
seen above. With analog documents, the older they are, the less problematic 
this often seems, as they generally contain far less data — especially when com- 
pared to the vast quantity of digital data that exists, be that digitized, born digi- 
tal or, as in the case of the cadastral office, a hybrid form that is part digitized, 
part born digital. The fast-growing and ever-changing digital landscape makes 
the need for documentation even more pressing for any historian or linguist, or 
indeed for any researcher using data that can be used in an historical analysis. 
This fast pace of change enabled by digital methods and tools also exhibits 
a higher risk of, or potential for, data loss. Permanent changes that are not 
documented can lead to irreversible loss of information, which could be cata- 
strophic. The loss or absence of data, however, is something inherent to histo- 
riographical work, but it is the quantity of data handled and possibly lost that 
is the key difference of modern day born digital and traditional sources. 
Version control will be the most important heuristic feature to the onomas- 
tician (historian and linguist alike) using born digital or hybrid digital sources, 
if their analysis is to be able to extend beyond just the final product. There is a 
tendency in modern historiography to write not only a historical narrative of 
the facts, but also of the intermediate steps, as well as the motivations exerted 


46 This place is located in Bissen, a village in the middle of the Grand Duchy. 
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and decisions taken by any agents involved in the process. By doing so, a kind 
of cultural and workflow history can be established, something that has not yet 
been attempted on a global scale. 


4 For the future (synthesis) 


In discussing specific sources for place name studies in Luxembourg, it can be 
maintained that key issues subsist in both analog and (born) digital sources, as 
well as those sources that started out as analog, were digitized and then en- 
hanced digitally. I have made a case for the hybridity of place name sources, 
starting from the initial oral character of place names as a source and their 
function in a cultural environment. I highlighted the issues and methodological 
problems that arise when writing down and preserving these names, as well as 
the living nature of some of these sources. 

Provenance studies of sources, whether those sources analog or digital, al- 
ways suffer from the same key issues regarding lack of information. When the 
documentation of a source is not complete — or is totally lacking - the ability to 
assess the provenance of a source, along with all the intermediate steps that 
might have changed that source, is severely hindered. 

Although the key aspects are the same for the external criticism of both an- 
alog and digital sources, the pace of possible and actual changes effected in the 
digital realm makes these source types more complicated, or at least more labo- 
rious, to deal with. When external source criticism comes to a standstill be- 
cause of a lack of documentation the use of internal source criticism might be 
the only way to further examine the origin of a source, be it analog or digital. 

When archival practices remain focused solely on the preservation of a 
final document - failing to record the intermediate steps and changes the docu- 
ment has experienced, nor to archive the software tools previously used with 
it - the study of provenance will always be unsatisfying. Even though it may be 
argued that the software itself that is, or was, used by an institution is not their 
property to archive, at the very least the recording of a coherent and consistent 
software version history should be considered a must for the future historian. 
After all, the end result or product is only one facet of the source, a facet that 
cannot by itself convey all the changes, decisions, or problems that that docu- 
ment has encountered over its lifetime. 
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Jan Lotz 
Reconstructing Roman trade networks 


An experiment in approaching fragmented sources with 
network analysis 


This paper focuses on my study on trade and transport networks in the Gaulish 
and German provinces during the Roman Empire and the challenges that came 
with using network analysis with fragmented or uncertain sources, and is 
based on my perspective on and experiences during this study. First I give a 
brief glimpse of digital and ancient history, followed by an introduction to the 
sources, then some remarks on epigraphy in the digital age, as well as the ap- 
plication of network analysis and its difficulties, and finally some concluding 
thoughts. 

No matter how spectacular, innovative, or promising new digital tools, 
methods, and ways of conducting research may seem, the most important thing 
is to have a critical mindset, especially regarding the digital, the sources, and 
the way the digital is applied to those sources. This leads to an urgently 
needed combination of source criticism and tool and method criticism. While 
there is no point in denying the opportunities the digital turn offers, and that it 
will change today’s academia landscape, it is not the panacea for historiogra- 
phy and should not be treated as such.? 


1 Sybille Kramer adds the research question as a crucial criterion regarding the applicability 
of digital humanities in Sybille Krämer, “Der ‘Stachel des Digitalen’ — ein Anreiz zur Selbstre- 
flexion in den Geisteswissenschaften? Ein philosophischer Kommentar zu den Digital Humani- 
ties in neun Thesen,” Digital Classics Online 4, no. 1 (2018): 8. 

2 Michel de Certeau, Histoire et psychanalyse entre science et fiction (Paris: Gallimard, 1987), 
66-96; Erez Aiden and Jean-Baptiste Michel, Uncharted. Big Data as a Lens on Human Culture 
(New York: Riverhead Books, 2013); and Stylianos Chronopoulos, Felix Maier, and Anna Novo- 
khatko, Digitale Altertumswissenschaften. Thesen und Debatten zu Methoden und Anwendungen 
(Heidelberg: Propylaeum, 2020), 10. 
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1 Digital approaches in ancient history 


Charlotte Schubert and Corina Willkommen ascribe an “internationally very visi- 
ble pioneering role in the development of the e-humanities or digital humanities” 
(international sehr sichtbare Pionierrolle in der Entwicklung der eHumanities 
bzw. Digital Humanities) to classical studies and consider them well advanced in 
terms of digitization.’ But there is also skepticism and rejection regarding the 
hype around digital humanities. The development is seen as the replacement of 
“interpretation as a key competency in the humanities” (Interpretation als geistes- 
wissenschaftliche Schliisselkompetenz) by mathematical methods.* Sybille Kramer 
reminds us that the digital humanities are still humanities and thus a “humanities 
subdiscipline” (geisteswissenschaftliche Teildisziplin).” There is also criticism that 
barely any cooperation exists between digital humanities and classical studies. 
Furthermore, some historians see the digital humanities as “a renewal possibility” 
(Möglichkeit der Erneuerung) and “the humanities’ only chance of survival” (ein- 
zige Uberlebenschance |... .] der Geisteswissenschaften), while others are critical of 
these developments and warn against the uncritical projection of the epistemic 
principles of the natural sciences onto the humanities.° In ancient history studies, 
many digital projects focus on the creation of digital editions of ancient texts.’ A 
famous digital project on ancient economic history is ORBIS, which has been 
heavily criticized. Regarding network analysis, which will be discussed in detail 
in Section 4 as the approach in my research, Christian Rollinger considers ancient 
historians to be “(fashionably) late to the party.” It was not until the early 1990s 


3 Charlotte Schubert and Corina Willkommen, “Alte Geschichte,” in Clio Guide — Ein Hand- 
buch zu digitalen Ressourcen fiir die Geschichtswissenschaften, ed. Laura Busse et al. (Berlin: 
Humboldt-Universität zu Berlin, 2018), C.1-5. 

4 Krämer, “Der ‘Stachel des Digitalen,’” 7. 

5 Krämer, “Der ‘Stachel des Digitalen,’” 6. 

6 Chronopoulos, Maier, and Novokhatko, Digitale Altertumswissenschaften, 10. 

7 For example, the Perseus Digital Library, accessed September 8, 2021, https://www.perseus. 
tufts.edu/hopper/ and papyri.info, accessed September 8, 2021, http://papyri.info/, as well as 
several epigraphic databases. 

8 “ORBIS: The Stanford Geospatial Network Model of the Roman World,” https://orbis.stan 
ford.edu/; Pascal Warnking, Der römische Seehandel in seiner Blütezeit. Rahmenbedingungen, 
Seerouten, Wirtschaftlichkeit (Rahden: Marie Leidorf, 2015), 178-83; Leif Scheuermann, “Ge- 
schichte der Simulation / Simulation der Geschichte. Eine Einführung,” Digital Classics Online 
6, no. 1 (2020): 16-9; and Ullrich Fellmeth, “Möglichkeiten und Grenzen der Quantifizierung 
und Modellierung von antiken Handels-Transportbedingungen — aus ökonomischer Sicht,” 
Digital Classics Online 6, no. 1 (2020): 137-9. 

9 Christian Rollinger, “Prolegomena. Problems and Perspectives of Historical Network Re- 
search and Ancient History,” Journal of Historical Network Research 4 (2020): 2. 


Reconstructing Roman trade networks —— 181 


that network analysis was applied to studies in ancient history, for example by 
Michael Alexander and James Danowski who analyzed Roman society based on 
letters written by Cicero.” At the beginning of the twenty-first century, network 
analysis was still rarely used in historiography, but its use has increased signifi- 
cantly in recent years." Rollinger, who analyzes the phenomenon of friendship 
and connections among the Roman upper class using social network analysis dur- 
ing the late Roman Republic, criticizes the often metaphorical use of the term 
“network” in (ancient) historiography, but also notes a turn toward the actual 
methodology of network analysis.” An example of the metaphorical use of “net- 
work” without deeper analysis comes from Wim Broekaert who uses it to describe 
connections between individuals or families, but refrains from further investiga- 
tion.'? Network analysis performed on fragmented sources, however, is hardly dis- 
cussed in the literature on historical network research.” 

When I studied ancient history and archaeology there was no mention of 
anything related to digital history, nor indeed network analysis. I had never 
heard of either of these terms before embarking on my dissertation and was 
quite skeptical about what computer science could do for historiography and 
that there could be any advantages to combining the two. They seemed worlds 


10 Michael Alexander and James Danowski, “Analysis of an ancient network. Personal com- 
munication and the study of social structure in a past society,” Social Networks 12 (1990), 
313-35. 

11 For an overview of ancient history and network analysis see Christian Rollinger, Amicitia 
sanctissime colenda. Freundschaft und soziale Netzwerke in der Spdten Republik (Heidelberg: 
Vandenhoeck & Ruprecht, 2014), 367-81. For a more general view see Linton C. Freeman, The 
development of social network analysis. A study in the sociology of science (Vancouver: Book- 
surge Publishing, 2004). More recently, see Christian Nitschke, “Die Geschichte der Netzwer- 
kanalyse,” in Handbuch Historische Netzwerkforschung. Grundlagen und Anwendungen, ed. 
Marten Diiring et al. (Berlin: Lit Verlag, 2016), 11-29; and Matthias Bixler, “Die Wurzeln der 
Historischen Netzwerkforschung,” in Handbuch Historische Netzwerkforschung, ed. Diiring 
et al., 45-61. For a regularly updated bibliography regarding network analysis in ancient his- 
tory see “HNR Bibliography: Ancient History,” accessed November 3, 2020, https://historical 
networkresearch.org/bibliography/#Ancient%20History. 

12 Rollinger, Amicitia sanctissime colenda, 353-54; and Christian Nitschke and Christian Rollinger, 
“Network Analysis is performed.’ Die Analyse sozialer Netzwerke in den Altertumswissenschaf- 
ten: Rückschau und aktuelle Forschungen,” in Knoten und Kanten III. Soziale Netzwerkanalyse in 
der Geschichts- und Politikforschung, ed. Markus Gamper, Linda Reschke, and Marten Diiring (Bie- 
lefeld: transcript, 2015), 229-30. 

13 For example, Wim Broekaert, Navicularii et negotiantes. A prosopographical study of Roman 
merchants and shippers (Rahden: Marie Leidorf, 2013). 

14 For example, Eva Jullien, “Netzwerkanalyse in der Mediävistik. Probleme und Perspektiven 
im Umgang mit mittelalterlichen Quellen”, Vierteljahrschrift fiir Sozial- und Wirtschaftsge- 
schichte 100, no. 2 (2013), 135-53. 
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apart and without connection: one digs into and wants to understand the past, 
while the other focuses on modern and future technologies. 


2 Inscriptions as sources 


Due to the lack of ancient literary texts, the main sources for my study were inscrip- 
tions. The source material consisted of over 250 inscriptions,” mostly found in im- 
portant cities like Lyon, Narbonne, and Trier, or along important roads and rivers.” 

In the field of ancient economic history, especially regarding trade, inscrip- 
tions play an important role. They are one of the few sources created by mer- 
chants themselves or by people in their surroundings. They therefore offer a 
more direct access to researching trade and transport in antiquity. Neverthe- 
less, there are several challenges and obstacles in using inscriptions as sources. 

The main such problem or challenge is the state of preservation of the in- 
scriptions, which can result in uncertain readings and different interpretations. 
Following are four inscriptions that document trade or transport, and their 
texts according to the Corpus Inscriptionum Latinarum (CIL), which is the most 
important collection of Latin inscriptions.” The reading of the texts is based 
inter alia on the meaning of the abbreviations (e.g. NEG = negotiator) and com- 
parisons with other inscriptions, but also on assumptions." 


15 Admission criteria are professions (e.g. negotiator, mercator, nauta, navicularius), organiza- 
tions connected to trade/transport (e.g. Collegium negotiatorum Cisalpinorum et Transalpino- 
rum), or other connections to trade and transport (e.g. by the symbology of the inscription). 
Some inscriptions require further investigation (e.g. mercator can also appear as a cogno- 
mina). Many other inscriptions that might indicate trade or transport were not included be- 
cause the connection was uncertain. 

16 These rivers were major transport routes, especially the Rhöne and Saöne. Nautae (shippers) 
of these rivers: Rhöne: AE 1982, 703; 1997, 1130; CIL XII, 1667, 1797, 3317; XIII, 1716, 1967, 1996, 
2002, 2494; Saöne: AE 1975, 613; CIL VI, 29722; XII, 1005; XIII, 1709, 1911, 1954, 1972, 2009, 2020, 
2028, 2041, 5096, 5489, 11179; Rhöne/Saöne: CIL XII, 3316; CIL XIII, 1688, 1695, 1918, 1960, 1966, 
11480. See also Thomas Schmidts, Akteure und Organisation der Handelsschifffahrt in den nord- 
westlichen Provinzen des Römischen Reiches (Mainz: Schnell & Steiner, 2011). 

17 The work began in 1853; today, there are 17 volumes with over 180,000 inscriptions, with 
new supplements and editions added on a regular basis. 

18 For a list of publications see the Epigraphik-Datenbank Clauss/Slaby (EDCS, http://db. 
edcs.eu/epigr/epi.php?s_sprache=en) and the Epigraphic Database Heidelberg (EDH, https:// 
edh-www.adw.uni-heidelberg.de/home?&lang=en). Symbols according to Leiden Conventions. 
For an introduction to epigraphy see, for example Christer Bruun and Jonathan Edmondson, 
The Oxford Handbook of Roman Epigraphy (New York: Oxford University Press, 2015). 
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Fig. 1: Roman trade inscription CIL XIII, 1911, EDCS, EDCS-10500866, Location: Lyon, Date: 
75-125,'? Text: CAPRONIO APRONI //BLANDI FIL//RAPTORI//TREVERO//DEC EIVSD CIVITATIS//N 
ARARICO PATRON//EIVSDEM CORPORIS//NEGOTIATORES// VINARI// LVGVD CON[SISTJENTES 
BENE DE SE M[ere]NTI //PATRO[n]O// CVIVS STATVA[E DJEDICA// TIONE SPORTVLAS// DED 
NEGOT SING CORP XV ©Manfred Clauss. EDCS, Epigraphik-Datenbank Clauss/Slaby. https:// 
db.edcs.eu/epigr/bilder.php?s_language=de&bild=$CIL_13_01911.jpg;$CIL_13_01911_1. 
jpg&nr=2, accessed February 4, 2022. 


19 Broekaert, Navicularii et negotiantes, 31. See also Amable Audin and Yves Burnand, “Chro- 
nologie des épitaphes romaines de Lyon,” Revue des Etudes Anciennes 61 (1959): 324-5; Peter 
Kneißl, Die Berufsangaben auf den Inschriften der gallischen und germanischen Provinzen. Bei- 
träge zur Wirtschafts- und Sozialgeschichte der römischen Kaiserzeit vol. 2 (Marburg, 1977), 
138-9; Jean Krier, Die Treverer außerhalb ihrer Civitas. Mobilität und Aufstieg (Trier: Rheinisches 
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The first inscription is easy to read (Fig. 1). It was erected by the wine mer- 
chants of Lyon in honor of Caius Apronius Raptor, a patron of Lyon’s wine mer- 
chants and of the nautae of the Saöne. Caius Apronius? originated from Trier, 
or rather the Treveri, where he was a decurio. 

The second inscription is not as well preserved, but still quite readable 
(Fig. 2): Murranius Verus from the Treveri is described as a merchant for wine 
and ceramics. The third inscription is a small fragment found in Augst, with 
nine letters, two of which are barely recognizable (Fig. 3). 

It has been suggested to be part of an inscription of the Collegium negotia- 
torum Cisalpinorum et Transalpinorum, an association of merchants who traded 
across the Alps. This interpretation is likely but still speculative.” The last in- 
scription was erected by the Helvetii in honor of their patron Quintus Otacilius 
Pollinus, who also was the patron of the association of cisalpine and transalpine 
slave traders and the association of the nautae of the Rhöne and Saöne. It is pre- 
served in numerous fragments but most of the inscription is missing (Fig. 4).” 

There are also other difficulties. The dating is not always clear:”* some inscrip- 
tions can be dated to a year, or even an exact date with day, month, and year, but 
most of the time the dating is rather vague and inscriptions can only be dated to a 
range of decades or centuries.” Furthermore, only a small part of the entire corpus 


Landesmuseum Trier, 1981), 31-5; Lothar Wierschowski, Fremde in Gallien — “Gallier” in der 
Fremde. Die epigraphisch bezeugte Mobilität in, von und nach Gallien vom 1. bis 3. Jh. n. Chr. (Stutt- 
gart: Franz Steiner Verlag, 2001), 318-9; and Schmidts, Akteure und Organisation, 135-6. 

20 Different readings are possible, e.g. EDCS 10500989; Broekaert, Navicularii et negotiantes, 
no. 127; Krier, Die Treverer, no. 17; and Wierschowski, Fremde in Gallien, no. 494. 

21 Theophil Burckhardt-Biedermann, Die Kolonie Augusta Raurica. Ihre Verfassung und ihr 
Territorium (Basel: Helbing & Lichtenhahn, 1910), 5; Kolb and Ott, “Ein ‘Collegium negotiato- 
rum Cisalpinorum et Transalpinorum’ ”; Gerold Walser, “Corpus mercatorum cisalpinorum et 
transalpinorum,” Museum Helveticum. Schweizerische Zeitschrift fiir klassische Altertumswis- 
senschaft 48 (1991): 169-75; Hans Sütterlin, “Altes und Neues zur Augster Curia. Zwei neue In- 
schriftenfunde aus dem Forumsbereich von Augusta Raurica (Grabung Curia-Schutzdach 
1998.51),” Jahresberichte aus Augst und Kaiseraugst 20 (1999): 159-80; and Ludwig Berger, 
Führer durch Augusta Raurica (Basel: Schwabe Verlag, 2012), 35-6. 

22 Joyce Reynolds, “Q. Otacilius Pollinus of Aventicum,” Bulletin de l’Association Pro Aventico 
20 (1969): 53-7; Regula Stolba-Frei, “Q. Otacilius Pollinus. Inquisitor III Galliarum,” in Alte Ge- 
schichte und Wissenschaftsgeschichte, ed. Peter Kneißl and Volker Losemann (Darmstadt: Wis- 
senschaftliche Buchgesellschaft, 1988), 186-201; and Oelschig, “Methode und Geschichte.” 

23 Inscriptions can be dated in different ways - for example, by the text and content (e.g. ref- 
erence to a known person like an emperor or consul, a dated event or location, certain abbre- 
viations, expressions, symbology), the appearance of the letters, or the archaeological context. 
24 For example, John Bodel, “Epigraphy and the ancient historian,” in Epigraphic Evidence. 
Ancient History From Inscriptions, ed. John Bodel (London: Taylor & Francis Ltd, 2001), 49-52; 
Alison E. Cooley, The Cambridge manual of Latin epigraphy (Cambridge: Cambridge University 
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Fig. 2: Roman trade inscription CIL XIII, 2033, EDCS, EDCS-10500989, Location: Lyon, Date: 
125-150,” Text: ?/turJRANIO V[ero]//[civi] TREVERO N[ego]//[tiat]ORI VINAR[io //[et (2) art]IS 
CRETAR{iae] //[lug] CONSIST[enti]//[Mur]RAN (ius?) CONI[stans]// [frJATER ET H[eres] //?[agat] 
HO APTERJ[us lib]//[p c] ET SVB [asc ded] © Manfred Clauss and Véronique Krier. EDCS, 
Epigraphik-Datenbank Clauss/Slaby. https:////db.edcs.eu/epigr/bilder.php?s_language=de& 
bild=$CIL_13_02033.jpg;$CIL_13_02033_1.jpg;$VK_CIL_13_02033_2.jpg&nr=2,https://db. 
edcs.eu/epigr/bilder.php?s_language=de&bild=$CIL_13_02033.jpg;$CIL_13_02033_1.jpg; 
$VK_CIL_13_02033_2.jpg&nr=2, accessed February 4, 2022. 


of inscriptions is preserved and known today: most inscriptions have been lost. 
Géza Alföldy estimates the overall number of inscriptions that existed during the 
Roman Empire at between 20 and 40 million and notes that even this number 


Press; 2012), 398-434; and Christer Bruun and Jonathan Edmondson, “The Epigrapher at 
Work,” in Oxford Handbook, ed. Bruun and Edmondson, 14-7. 

25 Broekaert, Navicularii et negotiantes, 84-6. See also Audin and Burnand, Chronologie des 
épitaphes, 325-6; Kneißl, Die Berufsangaben 2, 199-200; Krier, Die Treverer, 54-6; and Wier- 
schowski, Fremde in Gallien, 357-8. 
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Fig. 3: Roman trade inscription CIL XIII, 5303, EDH, F027244, EDCS-ID: EDCS-10800707, EDH 
ID: HD009215, Location: Augst, Date: Early Roman Empire,”° Text: COL//CISAL ©KreSimir 
Matijević, Phototek CIL XIII/2 Flensburg/Trier. heidICON, Heidelberger Objekt- und 
Multimediadatenbank, https://heidicon.ub.uni-heidelberg.de/iiif/2/1439557%3A788001, 
accessed February 2022. 


might still be too small.” Today, approximately 500,000 Latin inscriptions are 
known,” which is between 1.3 and 2.6 percent of Alföldy’s estimate. Consequently, 
inscriptions are not necessarily representative.” Related to this is the concept of 


26 Kolb and Ott, “Ein “Collegium negotiatorum Cisalpinorum et Transalpinorum” in Augusta 
Rauricorum?” Zeitschrift fiir Papyrologie und Epigraphik 73 (1988): 107-10. 

27 Géza Alföldy, “Römische Inschriftenkultur von Hispanien bis zum vorderen Orient. Die Er- 
folgsgeschichte eines antiken Kommunikationsmediums,” in Die epigraphische Kultur der 
Romer. Studien zu ihrer Bedeutung, Entwicklung und Erforschung, ed. Angelos Chaniotis and 
Christian Witschel (Stuttgart: Franz Steiner Verlag, 2018), 37; and Géza Alföldy, “Die epigra- 
phische Kultur der Römer. Die Ausbreitung eines Kommunikationsmediums und seine Rolle 
bei der kulturellen Integration,” in Epigraphische Kultur, ed. Angelos Chaniotis and Christian 
Witschel, 70. 

28 Tommaso Beggio, “Epigraphy,” trans. Laurence Hooper, in The Oxford Handbook of Roman 
Law and Society, ed. Paul J. du Plessis, Clifford Ando, and Kaius Tuori (New York: Oxford Uni- 
versity Press, 2016), 43. Over 400,000 Latin inscriptions without instrumentum domesticum: 
Alföldy, Die epigraphische Kultur der Römer, 64; Géza Alföldy, “Tausend Jahre epigraphische 
Kultur im römischen Hispanien. Inschriften, Selbstdarstellung und Sozialordnung,” in Epigra- 
phische Kultur, 244; and Manfred Schmidt, “Carmina Latina Epigraphica, translated by Orla 
Mulholland,” in Oxford Handbook, ed. Bruun and Edmondson, 764. More than 300,000 pre- 
served inscriptions: Francisco Beltran Lloris, “The ‘Epigraphic Habit’ in the Roman World,” in 
Oxford Handbook, ed. Bruun and Edmondson, 132, 136. 

29 Werner Eck, “Befund und Realität. Zur Repräsentativität unserer epigraphischen Quellen 
in der römischen Kaiserzeit,” Chiron 37 (2007): 49-64. 
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Fig. 4: Roman trade inscription CIL XIII, 11480, EDCS, EDCS-12200144, EDH ID: HD009430,*° 
Location: Avenches, Date: after 138,** Text:?” Q OTACIL[ JO QVIR POLLINO //Q OTACIL[ 
J//CERIALI[ ] FILIO OMNIBVS HONORT JBV[ ] //APVD SVO[ ] FVNCTO T[ JR IMMVNIT[ ] //A DIVO [ ] 
ADRI ] DON[ JO INQVIS[ ] //II[ JIAR PA[ JNO VENAL TI //CISAL[ JINO[ ] ET TRANSALPINORVM // 
ITEM [ JAVT[ JR ART JICOR [ JDANICOR //OB [ ]G[ JA EIVS ERGA RE[ JL ERGAQ //SIN[ ] VN[ JVERSO[ ] 
RI JA // HELV[ JATRONO [ ]S ET // [ JIBTI[] [JB QV[ JE SV[ ] © Krešimir Matijević, Phototek CIL 
XIII/2 Flensburg/Trier. heidICON, Heidelberger Objekt- und Multimediadatenbank, https://hei 
dicon.ub.uni-heidelberg.de/detail/1440435, accessed February 2022. 


the “epigraphic habit”, that goes back to Ramsay MacMullen®’ and implies that 
the tradition of erecting stone inscriptions was not equally distributed across the 
different parts of the Roman Empire, but changed over time, and was also depen- 
dent on social status. Another example of an important aspect for the preservation 
of inscriptions was their material. Lastly, inscriptions only represent snapshots of 
the time in which they were erected. 

Studies based only on inscriptions should not be regarded as generalizable 
or generally valid, as they are based on only a very small fraction of the total 


30 Slightly different text: Helv[etii publ(ice) platrono [—]s et / [inscr?Jibtilon(es)?](!) [a]b qu 
[— dle sulo]. 

31 Date based on the mention of Divus Hadrianus. 

32 Stefan Oelschig, “Methode und Geschichte. Variationen zur Inschrift CIL XIII 11480,” in Ar- 
culiana. Recueil d’hommages offerts à Hans Bögli, ed. Franz E König and Serge Rebetez 
(Avenches: LAOTT, 1995), 47-60. 

33 Ramsay MacMullen, “The epigraphic habit in the Roman Empire,” The American Journal of 
Philology 103, no. 3 (1982): 233-46. 
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number of inscriptions. Besides this, new discoveries or reinterpretations of al- 
ready-known inscriptions can change the state of our knowledge. Regarding an- 
cient economic history, important epigraphic documents that are now mostly 
missing include, for example, freight lists and purchase or delivery contracts. 
Nevertheless, this should not lead to the rejection of results based on epigraphy. 


3 Epigraphy in the digital age: Epigraphic 
databases 


Digital resources have become essential in many aspects of the study of history. 
This also applies to epigraphy, with epigraphic databases like the Epigraphik- 
Datenbank Clauss/Slaby (EDCS) or the Epigraphic Database Heidelberg (EDH).** 
The EDCS was created in the 1980s to collect all known Latin inscriptions. It of- 
fers various search options, such as for the unique identity number (ID) of an in- 
scription within the database, publication, finding place and province, text, date, 
material, and type of inscription, as well as the personal status of the people 
mentioned in the inscription.” According to those responsible, the inclusion of 
Latin inscriptions is almost complete and the database now includes 99.5 percent 
of all published Latin inscriptions.”° The EDH was founded in 1986 and has been 
online since 1997, offering multiple search options. It offers some additional 
information such as the year of discovery, storage location, properties of the 
inscription carrier and the inscription, and a list of the people mentioned in 
the inscription. These different search options in particular make working 
with inscriptions much easier compared to working with the printed collec- 
tions of inscriptions. 

A big difference between the databases, however, is their size. The EDCS con- 
sists of around 520,000 inscriptions and is currently the most extensive digital 
collection of Latin inscriptions, while the EDH has around 81,000 inscriptions.” 


34 Tom Flliot, “Epigraphy and Digital Resources,” in The Oxford Handbook of Roman Epigra- 
phy, ed. Bruun and Edmondson, 78-85. 

35 For an explanation of these different options and guidelines for using the EDCS see http:// 
db.edcs.eu/epigr/hinweise/hinweis-en.html and for more detail in the German version see 
http://db.edcs.eu/epigr/hinweise/hinweis-de.html, accessed August 30, 2020. 

36 http://db.edcs.eu/epigr/hinweise/hinweis-de.html, accessed August 30, 2020. 

37 As at August 2020. Considering the research area of the dissertation, the numbers are as fol- 
lows: EDCS: ca. 110,000 inscriptions; EDH: ca. 13,500 inscriptions. Another sign of the higher 
information density of the EDH (which was already shown by the listed categories of every 
entry) is seen in the totals of dated inscriptions (Gaul and Germania, EDCS: 9,698; EDH: 10,012). 
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Hence the EDCS was used as the main resource for this study, with the EDH serv- 
ing as a complementary collection. 

While using the EDCS during the study, several problems arose — for example, 
some inscriptions were found to be included more than once, while others did not 
belong to the Roman Era 77 Another issue became clear when searching for inscrip- 
tions from Gaul and Germania: running a search across the Gaulish and German 
provinces combined, the database showed 109,421 Inscriptions," but searching for 
inscriptions in the two provinces individually resulted in almost 4,000 additional 
inscriptions being shown. Some locations in the database, such as Dijon or Lan- 
gres, are assigned to multiple provinces (e.g. “Belgica | Germania inferior” and 
“Belgica | Germania superior”) and so appeared more than once in the second 
search. Furthermore, some places are matched with the wrong province. The EDCS 
classifies Colijnsplaat as part of Gallia Belgica, although the course of the border 
between Gallia Belgica and Germania inferior is not clear.“° Another problem con- 
cerns the text of the inscriptions in the EDCS. Although there is a list of references 
for various editions or publications of the inscription, it is not clear which reading 
the database follows. Deviating readings are not mentioned, and critical or unclear 
points in the text are not marked. Sometimes the EDCS can be proven wrong by 
consulting the drawing in the CIL or the linked image in the database itself. 

Lastly, a description of the symbological features of the inscription and the 
option to search for inscriptions based on these are not possible. The symbology 
is an important part of the inscription and often adds crucial information. The 
inscription on the famous Igel Column (Trier, Germany) gives no information on 
the owners except for their names, but the large reliefs on its sides reveal that the 
owners had a role in the textile industry (CIL XIII 4206).*' 


38 An example for both is EDCS-44100009/EDCS-54900666. These two IDs refer to the same in- 
scription. The inscription is included twice in the EDCS. It also dates from the Middle Ages (Mat- 
thieu Michler, Les Vosges (Paris: Académie des Inscriptions et Belles-Lettres, 2005), no. 415 with 
further literature). See also Beltran Lloris, “The ‘Epigraphic Habit’ in the Roman World,” 136-41. 
39 As at February 2020. 

40 Inter alia Wolfgang Spickermann, Religion in den germanischen Provinzen Roms (Tiibingen: 
Mohr Siebeck, 2001), 8-13; Broekaert, Navicularii et negotiantes, no. 14, 35, 37, 48, 50, 65, 74, 91, 
94, 154, 157, 163, 168, 191, 203, 353, 1229, 1237, 1246, 1248, 1257, 1267, 1285, 1291, 1304, 1317; and 
Andreas Kakoschke, Ortsfremde in den römischen Provinzen Germania inferior und Germania 
superior. Eine Untersuchung zur Mobilität in den germanischen Provinzen anhand der Inschriften 
des 1. bis 3. Jahrhunderts n. Chr. (Möhnesee: Bibliopolis, 2002), 6: both treat Colijnsplaat as part 
of Germania inferior. 

41 For the similar CIL XII, 264: the text provides no information on the profession of the de- 
ceased, but the relief shows the transport of wine barrels. He was probably active in wine 
transport and perhaps a wine merchant. 
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Despite these problems, the EDCS is an important and valuable research 
tool and most of its disadvantages can be avoided by using a critical approach. 

Having discussed the inscriptions as sources, I would like to discuss what 
the gaps and uncertainties in the epigraphic sources mean for the application 
of (social) network analysis. 


4 The challenges of approaching fragmented 
sources using network analysis 


It was the goal of identifying networks and other long-term collaborations be- 
tween merchants that led me to consider using social network analysis in my 
study. CIL XIII, 8338 and 8568 were important for that decision as they mention 
two merchants, Tertinius Secundus and Priminius Ingenuus. Secundus was 
married to Priminia Sabina who may have been related to Ingenuus, thus pro- 
viding the possibility of a social network and collaboration between the Tertinii 
and Priminii. 

The value of using network analysis always depends on the quality and 
quantity of the sources, and the research question. As shown earlier, in ancient 
history in particular, historians often have to rely on fragmentary sources that 
contain uncertainties or do not contain all the information needed. Gaps in 
sources result in incomplete and unfinished networks that can change and 
have to be treated critically. Nonetheless, epigraphic materials that contain all 
the necessary information on potential relationships seem to beg a network an- 
alytical approach.” 

Several issues arose during my study. First, the dating of the sources was 
problematic. Second, only a fraction of the people in the networks created were 
actually merchants; most of the rest were probably family members, with no 
information on their professions available. Third, even the previous classifica- 
tion as “probably family members” is often questionable, as is the collaboration 
between them. The network of the Priminii, Sentii, and Tertinii shown in Fig. 5 
demonstrates these problems. 


42 Shawn Graham and Giovanni Ruffini, “Network Analysis and Greco-Roman Prosopogra- 
phy,” in Prosopography Approaches and Applications. A Handbook, ed. Katharine S. B. Keats- 
Rohan (Oxford: University of Oxford Unit for Prosopographical Research Linacre College, 
2007), 325-36; and Shawn Graham, “On Connecting Stamps. Network Analysis and Epigra- 
phy,” Les Nouvelles de l’Archéologie 135 (2014): 39-44. 
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Fig. 5: Priminii-Sentii-Tertinii network: Tertinius Secundus and Priminius Ingenuus are 
merchants with evidence; Tertinii, Sentii, Priminii are families. 2020. © Jan Lotz. 


This network is based on 17 inscriptions, mentioning 27 people.** Accord- 
ing to Broekaert, CIL XII, 8338 is dated between AD 100 and 220 and CIL XIII, 
8568 between AD 175 and 250, which adds up to a range of 150 years.“ It is 
possible that Tertinius Secundus and Priminius Ingenuus lived at the same 
time, but also that they did not, which makes the attempt to create a network 
between them highly questionable. The same applies to the other inscrip- 
tions.“ The longevity of the network is also debatable, as a single inscription 
only represents a snapshot in time. In Colijnsplaat, two inscriptions were found: 


43 AE 1926, 128, 130, 131; 1975, 643; 2001, 1457, 1461, 1472, 1476, 1492; CIL XIII, 1897, 5482, 
7394, 7899, 8338, 8545, 8568, 8601; Finke 307. 

44 Broekaert, Navicularii et negotiantes, no. 144, 175; Kneißl, Die Berufsangaben 2, 191-3, 195; 
Wierschowski, Fremde in Gallien, no. 575; Kakoschke, Ortsfremde in den römischen Provinzen, 
no. 1.37, 9.4; Andreas Kakoschke, Die Personennamen in den zwei germanischen Provinzen vol. 1 
(Rahden: Marie Leidorf, 2006), GN 989, 1281; and Brigitte Galsterer and Hartmut Galsterer, Die rö- 
mischen Steininschriften aus Köln (Mainz: Verlag Philipp von Zabern, 2010), no. 430. 

45 Kakoschke, Die Personennamen 1, GN 989, 1150, 1281. 
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one that had been erected by Sentius Atticus and Tertinius Quartinus and the 
other by Tertinius Virilis and Marius Agilis (AE 2001, 1461, 1476). If they were col- 
laborating merchants, the question that arises is whether this was a singular oc- 
currence or whether they collaborated on a regular basis. It is not possible to 
answer this question,“® but it remains important for the creation and credibility 
of networks, since singular events or collaborations do not result in (trading) 
networks. 

Commercial activities are confirmed for Tertinius Secundus and Lucius Pri- 
minius Ingenuus via the mention of their profession. But the inscriptions give 
no information on the profession of any of the others, although the Tertinii are 
considered to have been active in the cross-Channel trade between Britannia 
and the Gallic and Germanic provinces; the same might apply to Sentius Atticus 
and Marius Agilis. 

Furthermore, the other Sentii and two Priminii (AE 2001, 1457, 1461, 1476; 
CIL XIII, 5482, 8545) might also have been merchants.“’ Some of the actors in 
the network were decuriones or veterans, who could have been involved in 
trade (AE 1926, 128, 130, 131; Finke 307; CIL XIII, 1897, 7394, 8601).*° A coopera- 
tion between a textile dealer and someone dealing in bread might have been 
based on supplying soldiers near the Roman-Germanic border or the area 
around Cologne. Potentially, Priminius Ingenuus, as a negotiator vestiarius im- 
portator, imported textiles to the Cologne area, and the Nervi Tertinius Secun- 
dus sold bread and similar goods from the hinterland of the Belgica at the 
border. Regarding the Tertinii and Sentii from Colijnsplaat, it is possible that 
they were merchants - e.g. negotiatores allecari (of fish sauces) or negotiatores 
salarii (of salt)*? - or sailors (moritex), thus providing a connection to Britain 
for the export of grain products and the import of clothes. Perhaps the purpose 
of this network was the expansion of trading opportunities and goods, in order 
to achieve an advantageous or predominant position in supplying the border 


46 Another inscription from Colijnsplaat mentions a second Tertinius Virilis (AE 1975, 643) 
who can probably be considered to be the same person as the first one. He erected this inscrip- 
tion by himself which might indicate that the hypothetical collaboration with Marius Agilis 
was a singular occurrence. 

47 Broekaert, Navicularii et negotiantes, no. 175; and Kakoschke, Die Personennamen, GN 989, 
1150, 1281. 

48 Multiple decuriones and veterans from Gaul and Germania were active in trade or trans- 
port — decuriones: AE 1975, 630, 646; CIL XIII, 1688, 1695 1911, 5116, 11179; veterans: CIL XII, 
1906, 6677, probably CIL 8267a and 11812. 

49 These were the most common types of merchant in Colijnsplaat. Petrus J. Stuart and Julia- 
nus E. Bogaers, Nehalennia. Römische Steindenkmäler aus der Oosterschelde bei Colijnsplaat 1 
(Leiden: Rijksmuseum van Oudheden, 2001), 34-7. 
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region, or the organization of joint - and thus simpler — business trips to Brit- 
ain. Most members of this network are confirmed? to have been located in the 
northeastern part of the research area, with the exception of Tertinius Gessius 
(Lyon), Priminius Ursulus (Dijon) and Tertinius Catullinus (Friedberg). Maybe 
they can be interpreted as links to the rest of the Gallic and Germanic business 
world (especially Lyon). 

But, no matter how tempting these ideas seem, there is no evidence that 
any of these people, except Tertinius Secundus and Priminius Ingenuus, were 
merchants. A limitation to these two individuals (and the people mentioned on 
the same inscription) leads to negligible plausible network visualization. 

Even this reconstruction is highly speculative since the inscriptions do not 
confirm that these people definitely collaborated”? or were related as members 
of the same families.” The network is based on assumptions, with no evidence 
that it really existed. These problems are not limited to this specific network, 
but also apply to others found. 

However, the application of network analysis is not limited to epochs or 
subjects. The problems described here were the reason that I shifted my focus 
from social to spatial networks. Trade and transport are always connected to 
mobility — and inscriptions can be quite expressive in terms of such information 
(e.g. information on trading places, route, goods, origin of the goods or mer- 
chant). While the sources did not change and some of the problems mentioned 
earlier still existed, they were no longer as serious and hindering, for example 
regarding dating. While a relationship between two merchants is possible, it is 
usually based on at least two inscriptions. For the relationship to be possible in 
the first place, the two inscriptions have to be dated to a similar period. But for 
a connection between two cities, one inscription is sufficient, thus the simulta- 
neity of two different sources is no longer necessary. But there are new prob- 
lems, such as the role of the find spot of the inscription or the place of origin of 
the merchant. 


50 This also applies to people mentioned in the same inscription, for example Tertinius Quar- 
tinus and Sentius Atticus. Although it is rather likely that they did cooperate (assuming both 
of them were merchants), it is also possible that there were other reasons, e.g. to save money, 
to erect the inscription together. 

51 Due to the geographical proximity, it is quite possible that the Tertinii and the Priminii in 
the northeast were related, but there is no clear evidence for this. For more on collaboration 
between families see Wim Broekaert, “Welcome to the family! Marriage as business strategy in 
the Roman economy”, Marburger Beitrdge zu antiken Handels-, Sozial- und Wirtschaftsge- 
schichte 30 (2012). 
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Without implying any direction of the trade, edges in the network usually 
run between the find spot and the location mentioned in the inscription. The 
find spot serves as the merchant’s “base.” However, generalizing this approach 
can lead to mistakes. A bronze plaque that had been reworked into the bottom 
of a pot (CIL III, 14165,8), and which documents a conflict between the navicu- 
larii of Arles and the administration in Rome, was found in Beirut. In this case, 
the find spot should be ignored. Although researchers have repeatedly assumed 
the existence of a branch of the navicularii of Arles in Beirut because of this in- 
scription, there is no further evidence for it and it is possible that the document 
reached Beirut in a variety of other ways.” A person’s place of origin is often 
equated with the trading place. For Murranius Verus (CIL XIII, 2033) from the 
civitas Treverorum, it “|. . .] seems feasible that Verus was mainly shipping Gal- 
lic and possibly Mediterranean wine to the northern provinces |... JL On the 
way back from Trier [to Lyon, where his epitaph was found], he then may have 
been concentrating on ceramics.”” Similar cases include CIL XIII, 1998" and 
CIL XIII, 1911, 11179.” Although the assumption that a merchant had local con- 
tacts in his homeland and maintained business connections to this place seems 
likely, thus justifying the question mentioned, it is not usually verifiable. 

Ultimately, there is no procedure that is universally applicable. Only a one- 
by-one examination of the inscriptions and the locations mentioned, based on 
the individual assessment of the historian, can help decide on their value and 
function for network creation. 


52 Peter Kneißl, Die Berufsangaben auf den Inschriften der gallischen und germanischen Provin- 
zen. Beiträge zur Wirtschafts- und Sozialgeschichte der römischen Kaiserzeit vol. 1 (Marburg 1977), 
206-7. For more details see Catherine Virlouvet, “Les naviculaires d’Arles. A propos de l’inscrip- 
tion provenant de Beyrouth,” Mélanges de l’École francaise de Rome, Antiquité 116, no. 1 (2004): 
327-70; Mireille Corbier, Donner ä voir, donner ä lire. Mémoire et communication dans la Rome 
ancienne (Paris: CNRS, 2006), 233-42; and Schmidts, Akteure und Organisation, 62-7, 146. 

53 Broekaert, Navicularii et negotiantes, no. 127. See also Wierschowski, Fremde in Gallien, 
no. 494. Against this see Krier, Die Treverer, no. 17. 

54 Peter Kneißl, “Die utriclarii. Ihre Rolle im gallo-römischen Transportwesen und Weinhan- 
del,” Bonner Jahrbiicher 181 (1981): 185; Wierschowski, Fremde in Gallien, no. 468; and Broe- 
kaert, Navicularii et negotiantes, no. 372. 

55 Krier, Die Treverer, no. 7, 8; Wierschowski, Fremde in Gallien, no. 443, 586; and Broekaert, 
Navicularii et negotiantes, no. 13. 


Reconstructing Roman trade networks — 195 


5 Conclusion 


One of the key problems of network analysis in ancient history, if not the key 
problem, is the source situation, which is vastly different from modern history. 
Not only is the number of sources different, but also the information density 
and quality. In contrast to modern history, the sources in ancient history, if 
they exist, are often heavily fragmented with uncertain content and meaning. 
This makes it harder to gather sufficient and reliable data for network analysis. 
Broekaert warns of the limitation of network analysis and of the use of overly 
extensive mathematical calculations. Ancient historians “always work with 
fragmentary networks, isolated glimpses of a wide set of relationships.”°° 

To a certain extent, incorporating network analysis as a key part of my the- 
sis set the research focus and questions. Was its use justified? As outlined, 
some serious problems came with this approach. Social network analysis could 
not be used to its full potential and the results did not meet my expectations: 
the sources simply did not support this approach. However, the number of 
sources was not the main obstacle, but rather the uncertainties, especially in 
the form of dating. 

Should this study therefore be seen as a failure? The University of Luxem- 
bourg’s doctoral training unit in digital history and hermeneutics aims to “pro- 
vide an experimental space” with the concept of “‘thinkering’ [as] the playful 
experimentation with digital tools and technologies for historical research.” 
And experiments will sometimes lead to a negative result or a dead end. In this 
case, the sources and data available for my study were too incomplete for its pur- 
poses and did not allow coherent social networks to be created - as a result, a 
meaningful analysis was not possible. 

On the other hand, the non-applicability of social network analysis led to 
my taking a closer look at the sources and to experimenting with them regard- 
ing network analysis more generally. Maybe the value should be seen in deal- 
ing with the problems, rather than in the actual outcome. Malte Rehbein warns 
of a “marginalization of criticism” (Marginalisierung der Kritik), with the dis- 
placement of critical questions in favor of results, as one of the risks of the digi- 
tal revolution.°° This applies not only to thematic, but also to methodological 


56 Wim Broekaert, “Financial experts in a spider web. A social network analysis of the ar- 
chives of Caecilius Iucundus and the Sulpicii,” Klio 95 (2013): 474. 

57 Andreas Fickers and Tim van der Heijden, “Inside the Trading Zone. Thinkering in a Digital 
History Lab,” Digital Humanities Quarterly 14, no. 3 (2020). 

58 Malte Rehbein, “L’historien de demain sera programmeur ou il ne sera pas.” (Digitale) Ge- 
schichtswissenschaften heute und morgen, Digital Classics Online 4, no. 1 (2018): 37. 
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questions. Leif Scheuermann urges a documentation not only of the results but 
also of the research process, to make the digital hermeneutic process under- 
standable and communicable.’ This also includes describing problems and 
setbacks, and a critical reflection of one’s own research process. 

What place does network analysis have in an environment of fragmented, 
uncertain, and scarce sources? First and foremost, it is not the researcher that 
decides the applicability, use, and type of network analysis — the sources do. 
The results have to be interpreted based on and in even closer connection to 
the sources than in the case of “complete” source material. Detailed knowledge 
of the sources and especially their shortcomings is key. So is their communica- 
tion. The documentation of the research process is especially important since 
digital methods can quickly produce impressive-looking results that can be 
hard to understand and verify for others. Furthermore, not only is methodologi- 
cal knowledge required but, especially and even more importantly, methodo- 
logical awareness.°° 

In the case of fragmented sources, although criticized by Rollinger, a limitation 
on network visualizations with or without limited further mathematical analysis — 
and a metaphorical use of the term “network” or a more descriptive approach — 
can be more appropriate, as seen in the works of Broekaert. Therefore, network 
“analysis”°' in this study is mostly a way of showing connections between people 
or rather cities. Nonetheless, fragmented or uncertain sources should not discour- 
age analysis, as long as their weaknesses are kept in mind and addressed. “Histori- 
cal network research into the ancient world will probably never (or only in very 
exceptional cases) be able to present analyses as detailed or encompassing as 
much information as network analysis is able to in contemporary sociological re- 
search or even in SNA [social network analysis] of the early modern and modern 
period. Both network researchers and ancient historians should accept this.” 


59 Leif Scheuermann, “Die Abgrenzung der digitalen Geisteswissenschaften,” Digital Classics 
Online 2, no. 1 (2016): 58-67. 

60 Regarding the Middle Ages, but with similar conclusions, see Robert Gramsch, “Zerstörte 
oder verblasste Muster? Anwendungsfelder mediävistischer Netzwerkforschung und das Quel- 
lenproblem,” in Handbuch Historische Netzwerkforschung, ed. Düring et al., 85-99. 

61 Network visualization would be a more fitting description. 

62 Rollinger, “Prolegomena,” 26. 
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Floor Koeleman 
Re-viewing the constcamer 
A digital approach to seventeenth-century pictures of collections 


1 Introduction 


In 2019, 100 selected masterpieces of Dutch and Flemish art (ca. 1350-1750) were pre- 
sented to the public and the art world as the CODART Canon. The final list had been 
compiled by members of the CODART international network of curators of Dutch and 
Flemish art, after a public vote.’ No less than two constcamer paintings were included 
in the final selection: The Five Senses (1617-1619) by Jan Brueghel I and Peter Paul 
Rubens, which is actually a series of five paintings, and the The Picture Gallery of 
Cornelis van der Geest (1628) by Willem van Haecht II.” This demonstrates how popu- 
lar constcamer paintings are among the public and art professionals. 

A constcamer is a specific type of painting created mainly for the Antwerp 
art markets in the seventeenth century. It depicts a room with a rich collection 
of paintings, musical and scientific instruments, animals, plants, people, and 
many other interesting elements that were of significant cultural relevance for 
the period. Despite its popularity, the genre is not well researched, and no com- 
plete overview exists to this day.” My PhD project aims not only to create a cor- 
pus of the constcamer paintings that have been preserved, but also to study 
their rich and complex content. This chapter explains the rationale behind the 
use of digital tools and methodologies to collect, archive, and analyze a dataset 
of over 160 constcamer paintings and the information relating to them. 


1 “The CODART Foundation, “About the CODART Canon,” accessed September 28, 2020. 
https://canon.codart.nl/about/. 

2 “The CODART Foundation, “100 Masterpieces,” accessed September 28, 2020. https:// 
canon.codart.nl/. 

3 The main reference work on the genre remains Simone Speth-Holterhoff, Les Peintres Fla- 
mands de Cabinets d’Amateurs au XVIIe Siècle (Brussels: Elsevier, 1957). For the historiography 
of the genre see Alexander Marr, “The Flemish ‘Pictures of Collections’ Genre: An Overview,” 
Intellectual History Review 20, no. 1 (2010): 5-25. 


Acknowledgments: | would like to thank Michael Korey for his insightful feedback on earlier 
drafts, and the Luxembourg National Research Fund (FNR) (10929115), who funded my research. 


8 Open Access. © 2022 Floor Koeleman, published by De Gruyter. Iech SST This work is licensed under 
the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. 
https://doi.org/10.1515/9783110723991-010 
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2 Digital art history 


In 2017, the lack of availability of datasets was characterized in the report on the 
symposium Art History in Digital Dimensions as “the primary obstacle for many 
art historians and students.”“ Creating datasets is currently the main work being 
done in the field of digital art history and, at the same time, an ongoing trend to 
digitize museum collections is contributing to the accessibility of artworks. “Yet 
even with these available resources,” the 2017 report continues, “the majority of 
researchers will have to develop their own dataset. For many, compiling this da- 
taset has the potential to be more challenging than mastering new software.” It 
requires a way of working in which art historians are not usually trained. 

Digital art history “has become a shorthand reference to the potentially 
transformative effect that digital technologies hold for the discipline of art his- 
tory.”° In the 2013 special issue of Visual Resources dedicated to digital art his- 
tory, Drucker posed the controversial question, “Is There a ‘Digital’ Art History?” 
She proposed a distinction between digitized art history, characterized as making 
use of online repositories and images, and digital art history, where computa- 
tional technology allows the use of analytic techniques.’ Computational analysis 
alone, however, cannot replace argumentation and interpretation.® Subsequent 
research has shown that Drucker’s distinction no longer holds.” 


4 Stephen Bury et al., “Art History in Digital Dimensions: The White Paper” (Washington DC: 
Frick Art Reference Library, 2017), 11, accessed June 17, 2021, http://dah-dimensions.org/report/. 
5 Bury et al., “Art History in Digital Dimensions,” 11. 

6 The Getty Foundation, “Digital Art History”, accessed September 28, 2020, https://www. 
getty.edu/foundation/initiatives/current/dah/. 

7 Johanna Drucker, “Is There a ‘Digital’ Art History?,” Visual Resources 29, nos. 1-2 (2013): 7; 
Benjamin Zweig, “Forgotten Genealogies: Brief Reflections on the History of Digital Art His- 
tory,” International Journal for Digital Art History 1 (2015): 37-49; and Anna Bentkowska-Kafel, 
“Debating Digital Art History,” International Journal for Digital Art History 1 (2015): 51-64. 

8 Claire Bishop, “Against Digital Art History”, Humanities Futures, Franklin Humanities Insti- 
tute, 2017, accessed September 20, 2020, https://humanitiesfutures.org/papers/digital-art- 
history/. According to Hans Brandhorst the “real question is whether in documenting our sour- 
ces the field will ever be able to keep one step ahead of researchers, providing them with 
ready-made answers when they are asking new questions.” Hans Brandhorst, “Aby Warburg’s 
Wildest Dreams Come True?,” Visual Resources 29, nos. 1-2 (2013): 76. 

9 Georg Schelbert, “Digital Art History - Digitale Kunstgeschichte, Überlegungen zum Ak- 
tuellen Stand,” in Computing Art Reader: Einführung in die Digitale Kunstgeschichte, ed. Piotr 
Kuroczynski, Peter Bell, and Lisa Dieckmann, Computing in Art and Architecture, vol. 1 (Hei- 
delberg: arthistoricum.net, 2018), 54. In her latest publication, Drucker fully acknowledges the 
importance of interpretation for the humanities. See Johanna Drucker, Visualization and Inter- 
pretation: Humanistic Approaches to Display (Cambridge, MA: The MIT Press, 2020). 
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It remains to be seen if computational analysis will ever gain the same im- 
portance in art history as in disciplines within the humanities that are primarily 
text-based.’ In art history, material artifacts without inherent digital represen- 
tation are traditionally the starting point of study. As Schelbert pointed out, the 
interpretation of art and its historical context is an intellectual and theoretical 
process. But the ways in which information is structured and links between 
data are made influence the interpretation of that data.” The latest digital art 
history special edition of Visual Resources (2019) similarly highlights that “cre- 
ating a database is anything but straightforward and that its complications can- 
not be separated from disciplinary, socio-historical, and ideological contexts.” 

The reassessment of the current state of research in the field of digital art 
history mainly reveals that “data sets are not ‘interpretations’ or ‘conclusions’ 
in and of themselves; all hypotheses and interpretations must be made by ex- 
amining data in conjunction with historical knowledge and taking into consid- 
eration the contexts in which the works and artists exist.”'” However, the focus 
on databases within digital art history seems to come at a cost. 

In 2012 Schelbert identified “image analysis and image annotation” (Bilda- 
nalyse und Bild-Annotation) as one of the six areas of work in digital art history. 
This aspect had disappeared from his list of 2018. A similar trend can be dis- 
cerned in the contributions to The Routledge Companion to Digital Humanities 
and Art History of 2020.” None of the thirty-four chapters deals explicitly with 
the analysis and annotation of images. Whenever images are referenced in this 
book, the focus is limited to the formal analysis of artworks rather than offering 
interpretations of what is depicted and its associated meanings. Traditionally 
the latter has been at the heart of art historical research. 


10 Schelbert, “Digital Art History,” 48; and Lev Manovich, “Data Science and Digital Art His- 
tory,” International Journal for Digital Art History 1 (2015): 13-35. 

11 Schelbert, “Digital Art History,” 54-5. 

12 Murtha Baca, Anne Helmreich, and Melissa Gill, “Digital Art History,” Visual Resources 35, 
nos. 1-2 (2019): 2. 

13 Baca, Helmreich, and Gill, “Digital Art History,” 3. 

14 Schelbert, “Digital Art History,” 45. The 2018 list consists of: innovative search strategies 
and tools; cross-media semantic linking and enrichment of information units; social media; 
reception research; digital visualizations and diagrams; and digital communication of art his- 
torical knowledge. 

15 Kathryn Brown, ed., The Routledge Companion to Digital Humanities and Art History, Rout- 
ledge Art History and Visual Studies Companions (New York: Routledge, 2020). 
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3 Looking, seeing, understanding 


The discipline of art history revolves around objects (e.g. paintings) and images 
(e.g. that which is represented in paint). Stories on the origins of art in general, 
and painting in particular, can already be found in Pliny the Elder’s Naturalis 
Historia (77-79 AD), for example.'© They all have in common that the outlines 
of a person’s shadow are traced. By the seventeenth century the art of painting 
had definitely become more sophisticated and a wide variety of materials was 
being used to create and support the image. The study of constcamer paintings 
within this project is primarily concerned with the analysis and interpretation 
of the image, regardless of its materiality.” 

For example, it is certainly impressive to experience the grandeur of Rem- 
brandt’s The Night Watch (1642) physically and aesthetically in the Rijksmu- 
seum.’® But in order to examine and understand the iconographic meaning 
embedded in the image — a meaning which is both sociohistorically and cultur- 
ally determined — the artwork can equally be studied from a screen, print, or 
any other form of reproduction.” 

To study constcamer paintings, this project does not focus on applying one 
single method or theory. In line with the recommendations of Lorenz, I am 
using a “multilateral, multi-method approach” combining formalized methods 
such as iconology, semiotics, and image studies in order to study and interpret 
these images.”° This means that, first of all, the pictorial properties of the art- 
works are looked at. The content of the images informs analysis and dictates 
the subsequent research necessary for interpretation. This is a process of look- 
ing, seeing or cognitively identifying what it is we are looking at, and determin- 


16 Pliny, The Natural History of Pliny, trans. John Bostock and Henry T. Riley (London: 
H. G. Bohn, 1855), 35.5. 

17 The branch of art history that deals with the materiality of artworks is called technical art 
history. 

18 See Christopher Morse’s contribution to this volume, Chapter 13. 

19 See for example the highly detailed photograph of The Night Watch available via Rijksmu- 
sum, “Operation Night Watch”, accessed October 15, 2020, https://www.rijksmuseum.nl/en/ 
nightwatch. 

20 Katharina Lorenz, Ancient Mythological Images and Their Interpretation: An Introduction to 
Iconology, Semiotics and Image Studies in Classical Art History (Cambridge: Cambridge Univer- 
sity Press, 2016), 245. This book is an excellent resource for those who are not yet familiar with 
the study of images and their interpretation. 
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ing meaning.” In addition I document part of this process textually by means 
of annotations. 

Annotating or adding information about what is represented in constcamer 
paintings poses a great challenge, mainly because there are no complementary 
sources that go with these pictures. Consequently, it can be very difficult to es- 
tablish what you “see” when you do not know exactly what you are looking at. 
This difference between looking and seeing has already been discussed by 
Fleck (1896-1961) in his 1947 paper on the philosophy of science entitled “To 
Look, to See, to Know.”” “Fleck distinguishes between ‘looking’ and ‘seeing’ — 
the former referring to the physiological process of visual perception, the latter 
to the cognitive aspect of identifying what someone is looking at.””? Contextual 
knowledge, as Fleck argued, is often necessary in order to be able to see - “To 
see, one has first to know.”~* 

Fleck’s view is not that different from the theories of knowledge that pre- 
vailed in previous centuries, which can be traced all the way back to classical 
antiquity. Interestingly, he illustrates the problem of seeing shapes or forms with 
the example of letters of the alphabet.” The understanding of the visual experi- 
ence was also given much thought in the Renaissance. Written text is something 
to be seen, just like a picture, and both text and image were conceived as part of 
visual culture. Moreover, according to Leonardo da Vinci (1452-1519), paintings 
give “unmediated access to nature that words cannot give,” and painting thus 
constitutes a kind of universal language that can replace the written word.” The 


21 This roughly corresponds to the three steps of iconology (i.e. phenomenal meaning, mean- 
ing dependent on content, and documentary meaning), or semiotic triangulation (of object, 
sign, and connotation). See Erwin Panofsky, “On the Problem of Describing and Interpreting 
Works of the Visual Arts,” trans. Jaś Elsner and Katharina Lorenz, Critical Inquiry 38, no. 3 
(2012): 482; and Lorenz, Ancient Mythological Images, 105. 

22 Ludwik Fleck, “To Look, to See, to Know [1947],” in Cognition and Fact: Materials on Lud- 
wik Fleck, ed. Robert S. Cohen and Thomas Schnelle, Boston Studies in the Philosophy of 
Science, vol. 87 (Dordrecht: D. Reidel Publishing Company, 1986), 129-51. 

23 Tim Boon et al., “A Symposium on Histories of Use and Tacit Skills,” Science Museum 
Group Journal 8, no. 8 (2020). 

24 Fleck, “To Look,” 134. 

25 Fleck, “To Look,” 131. 

26 Pamela H. Smith, The Body of the Artisan: Art and Experience in the Scientific Revolution 
(Chicago: University of Chicago Press, 2004), 92; and David Summers, The Judgment of Sense: 
Renaissance Naturalism and the Rise of Aesthestics, Ideas in Context (Cambridge: Cambridge 
University Press, 1987), 137-9. 
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concern with “how to adjust words to things, or verba to res” remained promi- 
nent well into the seventeenth century.” 

One of the reasons for the fascination with words and things (including im- 
ages) at that time was the exploration of the New World and the discoveries this 
led to. Since there were no antique sources describing the novelties that were 
being found, there were no textual authorities to verify such findings.”® Another 
reason was the “rise of the vernaculars” in an age of “inventorying and categoriz- 
ing” the visible world, which often meant that words did not yet exist and had to 
be invented.” The complexity of the pictorial sign, however, is that the meaning it 
signifies is not fixed and depends on historical and sociocultural factors.*° 

As a result, the meaning of the constcamer with its many representations 
has been largely lost, while the image has survived. This demonstrates that the 
transfer of images as a universal language without contextual information does 
not stand the test of time. In concrete terms this means that only part of the 
iconographic significance of the constcamer can be deduced from its images. 
The remainder requires the study of various contemporary sources in order to 
penetrate into the intellectual mindset of the period in which they were made. 
The findings based on looking and seeing can be documented in a dataset, but 
not the processes of determining meaning.” Interpretation is inextricably linked 
to additional art historical research. 


4 Classification and identification 


The Order of Things by Michel Foucault (1926-1984) has been studied exten- 
sively in relation to museums and collections, but less so in connection with 
constcamer paintings or pictures of collections.” Foucault’s form of historical 


27 Thijs Weststeijn, “From Hieroglyphs to Universal Characters: Pictography in the Early Mod- 
ern Netherlands,” in Netherlands Yearbook for History of Art - Nederlands Kunsthistorisch Jaar- 
boek 61, ed. Eric Jorink and Bartholomeus A. M. Ramakers (Zwolle: WBooks, 2011), 239. 

28 Smith, The Body of the Artisan, 42. 

29 Weststeijn, “From Hieroglyphs,” 269. 

30 Robert S. Cohen and Thomas Schnelle, eds., Cognition and Fact: Materials on Ludwik Fleck, 
Boston Studies in the Philosophy of Science, vol. 87 (Dordrecht: D. Reidel Publishing Com- 
pany, 1986), xi-xii. 

31 This corresponds to what in iconology is called documentary meaning, or connotation in 
semiotics (see above). 

32 Most notably in Eilean Hooper-Greenhill, Museums and the Shaping of Knowledge (London: 
Routledge, 1992). 
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awareness is useful when dealing with such images. On systems of classifica- 
tion, he famously quotes 


a “certain Chinese encyclopedia” in which it is written that “animals are divided into: (a) 
belonging to the Emperor, (b) embalmed, (c) tame, (d) suckling pigs, (e) sirens, (f) fabu- 
lous, (g) stray dogs, (h) included in the present classification, (i) frenzied, (j) innumera- 
ble, (k) drawn with a very fine camel-hair brush, (1) et cetera, (m) having just broken the 
water pitcher, (n) that from a long way off look like flies.” In the wonderment of this tax- 
onomy, the thing that we apprehend in one great leap, the thing that, by means of this 
fable, is demonstrated as the charm of another system of thought, is the limitation of our 
own, the stark impossibility of thinking that.” 


By replacing the example of a “Chinese encyclopedia” with a “constcamer paint- 
ing,” we realize that here too we are dealing with another system of thought. 

For example, fossilized shark teeth (see orange frame in Fig. 1) were found 
on beaches and thought, in the seventeenth century, to be fish tongues or 
“tongue stones.” They were categorized and depicted between other “stony” 
objects such as seashells and coral that were the subject of contemporary de- 
bates on petrifaction.”* Another example is the display of musical instruments 
together with clocks — the latter being considered today purely as mechanical 
devices for timekeeping, but which were then treated like trumpets and violas, 
associated with the greater theory of universal harmony.” The writing of his- 
tory, however, does require the “translation of past concepts and terms into 
ones that can be comprehended by modern-day audiences.”*° The same applies 
to the transformation of constcamers and other images into data. 


33 Michel Foucault, The Order of Things; An Archaeology of the Human Sciences (New York, 
1971), xv; and Hooper-Greenhill, Museums, 4. 

34 Marlise Rijks, “Catalysts of Knowledge; Artists’ and Artisans’ Collections in Early Modern 
Antwerp” (Ghent: Ghent University, 2016), 179 and 222-30. According to Rijks, it seemed impos- 
sible to classify coral at the time, because it was not known how it came into existence. Several 
classification suggestions were circulated (such as plant, stone, or animal), but no consensus 
was reached. See Marlise Rijks, “‘Unusual Excrescences of Nature’: Collected Coral and the 
Study of Petrified Luxury in Early Modern Antwerp,” Dutch Crossing 43, no. 2 (2019): 140. 

35 See for example the Allegory of Hearing, part of The Five Senses mentioned in the introduction. 
36 Adam Mosley, “‘Sundials and Other Cosmographical Instruments’: Historical Categories 
and Historians’ Categories in the Study of Mathematical Instruments and Disciplines,” in The 
Whipple Museum of the History of Science, ed. Joshua Nall, Liba Taub, and Frances Willmoth 
(Cambridge: Cambridge University Press, 2019), 80. 
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Fig. 1: Frans Francken Il, Cabinet of Art and Curiosities, ca. 1620-1625. Oil on panel, 74 x 

78 cm. Vienna, Kunsthistorisches Museum of Vienna, Gemäldegalerie, Inv. no. 1048. 

© Wikimedia Commons, accessed September 28, 2020, https://commons.wikimedia.org/ 
wiki/File:Frans_Francken_(Il),_Kunst-_und_Rarit%C3%A4tenkammer_(1636).jpg; painting 

© Kunsthistorisches Museum Vienna, CC BY-NC-SA 4.0, www.khm.at/de/object/912d2b1c7b. 
The shark tooth has been highlighted with the orange box by the author. 


5 Paintings as data 


The question of how art historical objects and images can be converted into con- 
cepts and terms that can be understood by today’s audience and, moreover, can 
be processed digitally, is one that was already being asked over fifty years ago. 


One way to bring an ideal system down to reality is to ask ourselves three questions. 
Once the program for the system is outlined, who will make it, who will use it, and who 
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will maintain it? [. . .] The second question, “Who will use the archive?,” is prompted by 
a slogan found on the walls of many computer centers. It reads, “Your formula for failure 
is to try to please everybody.”? 


Taking this advice to heart, I opted to cater mainly to my own needs. My dataset 
is set up so that it can easily be shared and used by others but, when making 
technical choices regarding the structure and format, for example, this was 
never a primary concern. And other potential users of this dataset will have 
their own equally specific needs, so it is not up to me to dictate their process. 
However, there are examples and best practices we can learn from. 

As we have seen, present-day digital art history projects often focus on the 
contextual information that surrounds works of art, for example when conduct- 
ing art market studies and provenance research.*® This is understandable from 
a data point of view, since context usually deals with text and numbers rather 
than images.” Projects that, on the other hand, include the iconographic mean- 
ing of artworks to a greater or lesser degree are often related to museums. On- 
line museum catalogs such as those of the Rijksmuseum and the Walters Art 
Museum sometimes indicate what is depicted in the online images of works 
from their Collections "H In this way, users are given additional ways to search 
and explore the data, but this is nowhere near the level of detail required for art 
historical research.“ 


37 Kenneth Lindsay, “Computer Input Form for Art Works: Problems and Possibilities,” in 
Computers and Their Potential Applications in Museums : A Conference Sponsored by the Metro- 
politan Museum of Art, 1968 (New York: Arno Press, 1968), 21-2. For more recent approaches, 
see especially Ross Parry, ed., “(Part One) Information: data, structure and meaning,” in Muse- 
ums in a Digital Age, Leicester Readers in Museum Studies (London: Routledge, 2010), 10-115. 
38 Examples of such projects are the London Gallery Project and Mapping Titian, respectively, 
accessed September 28, 2020, http://learn.bowdoin.edu/fletcher/london-gallery/; and http:// 
www.mappingtitian.org/. 

39 Furthermore, “the lack of trained individuals to describe visual content is a continuing im- 
pediment to providing access to photographic and other visual collections,” as noted in Joan 
E. Beaudoin, “Describing Images: A Case Study of Visual Literacy among Library and Informa- 
tion Science Students,” College & Research Libraries 77, no. 3 (2016): 389. 

40 Getty Foundation, Museum Catalogues in the Digital Age: A Final Report on the Getty Foun- 
dation’s Online Scholarly Catalogue Initiative (OSCI) (Los Angeles: Getty Foundation, 2017); 
and Claire Quimby, Digital Catalogues Study: A Cross-Institutional User Study of Online Museum 
Collection Catalogues (Chicago: Art Institute of Chicago, 2019), https://digpublishing.github. 
io/catalogues-study/. See for example http://hdl.handle.net/10934/RM0001.collect.96871; and 
https://art.thewalters.org/detail/14623/the-archdukes-albert-and-isabella-visiting-a-collectors- 
cabinet/, all accessed October 16, 2020. 

41 This statement is based on my extensive research in 2017 into the usability of datasets, such 
as those of the Rijksmuseum and Metropolitan Museum of Art, for answering art historical 
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One of the online projects that bring together and present art historical 
data from numerous museum and other collections is the website janbrueghel. 
net. This website offers a complete catalog of the works of Jan Brueghel I and 
includes two companion sites dedicated to Pieter Bruegel I (Jan I’s father) and 
the Brueghel family.“ Together they are “meant to provide ways of furthering 
our understanding of how the Brueg(h)el family produced a complex body of 
interconnected work.”“? The catalog entries are sometimes accompanied by a 
discussion section that offers valuable insights into past and present scholarly 
debates. While tags are a means of roughly indicating what the artworks repre- 
sent, image annotation is not the main concern of this particular website. 

The Wikimedia Commons website, on the other hand, has implemented a 
different solution to annotating and referring to other Wikimedia image entries. 
Its online image of the constcamer painting Cabinet of Art and Curiosities (ca. 
1620-1625) (Fig. 1), for example, is supplemented with several annotations that 
become visible when moving the mouse pointer over the image.“ These mouse- 
overs show either a text or an image, notably of the paintings represented in the 
constcamer, and clicking on one of these takes the user to the Wikimedia entry 
for that specific artwork.*° 

Wikimedia’s annotations are an elegant solution, but the inclusion of text 
that can be entered freely results in descriptions such as “? Mitra cardinalis” and 
“probably some Amphidromus” regarding the seashells on display in Fig. 1. From 
a computational point of view it would be desirable to structure such data by 
using controlled vocabularies, so that all depictions annotated with Mitra cardi- 
nalis are understood as the same type of seashell. When in doubt about what 
kind of seashell is represented, it would be more reasonable to simply annotate 
“seashell” instead of including a question mark in the annotation. 


questions. Some of the results can be found on “Visualizing Visions”, accessed October 18, 
2020, http://visualizingvisions.com/. 

42 Elizabeth Honig, “Jan Brueghel,” University of Maryland, Baltimore, accessed September 28, 
2020, http://www.janbrueghel.net/. 

43 Elizabeth Honig, “Pieter Bruegel” and “Brueghel Family,” University of Maryland, Balti- 
more, http://pieterbruegel.net/; and http://brueghelfamily.net/, both accessed September 28, 
2020. 

44 Wikimedia Commons, “Chamber of Art and Curiosities,” accessed September 28, 2020, 
https://commons.wikimedia.org/wiki/File:Frans_Francken_(II),_Kunst-_und_Raritaétenkammer_ 
(1636).jpg. 

45 One of the small portraits on the left, for example, links to Peter Paul Rubens’ Abraham 
Ortelius, available at https://commons.wikimedia.org/wiki/File:Abraham_Ortelius_by_Peter_ 
Paul_Rubens.jpg, accessed January 7, 2021. 
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The Getty Vocabularies are usually at the basis of digital art history projects 
dealing with datasets.“ These controlled vocabularies are reference works that 
contain structured terminology for categorizing works of art and architecture (in 
the Art & Architecture Thesaurus, or AAT), their creators and current owners (in 
the Union List of Artist Names, or ULAN), and associated geographic names (in the 
Getty Thesaurus of Geographic Names, or TGN). These vocabularies have been in 
development since the late 1960s for museum cataloging and information re- 
trieval.”’ It is important to keep in mind, however, that historical terms and con- 
cepts are not necessarily part of the vocabularies. For example, “fish tongues” or 
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Fig. 2: Jan Brueghel Il, Allegory of Sight (Venus and Cupid in a Picture Gallery), ca. 1660. Oil 
on copper, 58.1 x 89.7 cm. Philadelphia, Philadelphia Museum of Art, Inv. no. 656. 

© Wikimedia Commons, accessed September 28, 2020, https://commons.wikimedia.org/wiki/ 
File:Jan_Breughel_Il_-_Allegory_of_Sight_-_gallery_painting_Cat656.jpg; painting © Philadelphia 


Museum of Art, accessed September 28, 2020, https://www.philamuseum.org/collection/object/ 
102459. The sector has been highlighted with the rectangular box by the author. 


46 Diane M. Zorich, Transitioning to a Digital World: Art History, Its Research Centers, and Dig- 
ital Scholarship (The Samuel H. Kress Foundation & The Roy Rosenzweig Center for History 
and New Media, George Mason University, 2012); and Patricia Harpring, Introduction to Con- 
trolled Vocabularies: Terminology for Art, Architecture, and Other Cultural Works, 1st ed. (Los 
Angeles, CA: Getty Research Institute, 2010). 

47 Brown, The Routledge Companion, 440. 
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“tongue stones” are not included in the Getty’s AAT, but “shark teeth” are, yet 
without reference to earlier interpretations.*® 


6 Constcamer paintings as data 


Annotating paintings can be a complex task and involves collecting metadata, 
then breaking down the content of the images into thematic and iconographic 
elements. Conceptually, my dataset consists of “entities” and “links”: an entity 
can be connected to another entity through such a link. For example, the con- 
stcamer painting entitled Allegory of Sight (Venus and Cupid in a Picture Gallery) 
(Fig. 2) is an entity. Another entity is the painting’s artist, the person Jan Brue- 
ghel II (Jan I’s son). These two entities are connected to each other by means of 
the link type “creator.” In this way it is documented that the Allegory of Sight 
was created by Jan Brueghel II, the Flemish painter and draftsman who lived 
from 1601 to 1678.” It is useful to refer to Jan Brueghel Us record in the Getty’s 
ULAN because his name can be written in many ways but, with the ULAN, we 
know exactly which artist is meant.°° 

The same method is used to annotate what is depicted in a constcamer 
painting, only this time with the link type “depicts.” The entity Allegory of Sight 
depicts among other things the entity “sector.” This term can mean different 
things, and therefore reference is made to a specific Getty AAT record that de- 
scribes sectors, in this context, as “proportional measuring gauges consisting 
of two straight, metal bars hinged at one end and graduated for measuring; 
used in clockmaking” (see Fig. 2).°’ By the end of the sixteenth century, the pe- 
riod of its invention, the main use of the sector was to solve mathematical prob- 
lems, and the design of the instrument was continuously improved upon — but 
this aspect is not captured by the Getty Vocabularies. 


48 Getty Research Institute, “Shark Teeth” (AAT), accessed October 16, 2020, http://vocab. 
getty.edu/page/aat/300379302. 

49 Getty Research Institute, “Brueghel, Jan, the younger” (ULAN) accessed September 28, 
2020, http://vocab.getty.edu/page/ulan/500013747. 

50 Whenever an artist is missing from the Getty’s ULAN, the Netherlands Institute for Art His- 
tory’s online resource “RKD Explore” is used as the authority instead. See for example, The 
Netherlands Institute for Art History (RKD), “Jan Breughel (II),” accessed September 28, 2020, 
https://rkd.nl/explore/artists/13289. 

51 Getty Research Institute, “Sectors” (AAT), accessed September 28, 2020, http://vocab.getty. 
edu/page/aat/300201680. 
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Additionally, the list of terms in other languages provided by Getty’s AAT is 
far from comprehensive. The sector is referred to in French as the compas de pro- 
portion, in German as the Proporzionalzirkel, in Dutch as the proportionaalpasser, 
and in Italian as the compasso di proporzione. The proportional compass, however, 
is known in French as the compas de réduction, in German as the Reduktionszirkel, 
in Dutch as the reductiepasser, and in Italian as the compasso di riduzione.” To 
complicate matters even further, each inventor who developed a variation on the 
sector, around the year 1600 that is, also gave their invention a new name. 
Thomas Hood (ca. 1556-1620) was the first to call his instrument a sector, inspired 
by Euclid’s Elements, while Michiel Coignet (1549-1623) speaks of his pantometre, 
and Muzio Oddi (1569-1639) of his compasso polimetro.” 

Nevertheless, the entity “sector” provides a basis for mapping and compar- 
ing all instances of representations of this type in constcamer paintings. Such 
annotations are the result of looking and seeing understood as the cognitive 
identification of what we are looking at. In order to determine meaning we 
need to broaden our view and take into account not only the realistic, but also 
the allegorical qualities of a constcamer painting such as the Allegory of Sight. 
Its overall theme is the sense of sight, the most important of the five classical 
senses (i.e. sight, hearing, taste, touch, and smell).** The inclusion of a mathe- 
matical instrument such as the sector in this painting suggests a symbolic sig- 
nificance of the instrument as an aid to vision or a tool to improve sight. 


7 The constcamer dataset: Possibilities 
and limitations 


The sector is just one small representation — of about 3.8 by 3.6 centimeters — 
amid many others in the Allegory of Sight. Each of the represented objects, 


52 Ad Meskens, “Michiel Coignet’s Contribution to the Development of the Sector,” Annals of 
Science 54 (1997): 143. See also, Getty Research Institute, “Proportional Compasses” (AAT), 
accessed September 28, 2020, http://vocab.getty.edu/page/aat/300022492; and https://cata 
logo.museogalileo.it/approfondimento/Compasso.html. 

53 Robert Bud and Deborah Jean Warner, eds., Instruments of Science: An Historical Encyclo- 
pedia, Garland Encyclopedias in the History of Science, vol. 2 (New York: Garland Publishing, 
1998), 527; and Filippo Camerota, The Geometric and Military Compass of Galileo Galilei, ed. 
Filippo Camerota and Giorgio Strano, Scientific Instruments: History, Exploration, Use 1 (Flor- 
ence: Scatolificio Isolotto, 2004), 62. 

54 Charles M. Peterson, “The Five Senses in Willem II van Haecht’s Cabinet of Cornelis van der 
Geest,” Intellectual History Review 20, no. 1 (2010): 105-9. 
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animals, plants, people, and interior and exterior elements have stories of their 
own. This abundance of data can be effectively collected and archived in a rela- 
tional database management system. My project makes use of a no-code develop- 
ment platform (NCDP), which is database management software with a graphical 
user interface.” Currently, my dataset holds 161 constcamer paintings in the form 
of images and associated information. These give rise to approximately 3,400 en- 
tities that are connected to each other via 13 link types. In total I have recorded 
about 12,700 such connections between these entities. 

The constcamer dataset describes the contents of pictures of collections. 
These paintings provide insight into contemporary thoughts on the organization 
of items included in collections and the associated meanings they represented.°° 
Having to be precise when naming the individual entities depicted in constcamer 
paintings actually leads to improved vision. A shark tooth or sector can easily be 
overlooked, but this is less likely when applying a label to each representation in 
a painting. In this way annotation promotes accuracy, which generates a more 
extensive overview of what is displayed in the seventeenth-century pictures of 
collections. Moreover, by looking at constcamer paintings collectively, repetitions 
of subject matter and certain entities can readily be observed. 

At the same time, there are the issues of transformation and translation. As 
we have seen, the dataset requires a transformation of constcamer paintings 
into data. These data are a modern interpretation of the pictorial content and 
require additional translation to expose historical and ideological meanings. 
The constcamer dataset is therefore not an interpretation or conclusion in itself, 
but rather a starting point for further analysis. 


8 Conclusion 


Constcamer paintings are rich and varied images whose content can be “re- 
viewed” via a dataset. While the dataset is an integral part of digital art his- 
tory, the analysis and annotation of images is currently an underrepresented 
area of work in this field. One of the main reasons for this is that a transfor- 
mation is needed to turn artworks into representative digital equivalents. A 


55 For a brief overview of NCDPs, see, https://www.g2.com/categories/no-codedevelopment- 
platforms, accessed September 28, 2020. 

56 See especially “Vorrede - Das Objeckt als Symbol” in Andreas Grote, ed., Macrocosmos in 
Microcosmo: Die Welt in der Stube: Zur Geschichte des Sammelns, 1450 bis 1800, Berliner 
Schriften zur Museumskunde, vol. 10 (Opladen: Leske + Budrich, 1994), 11-9. 
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further difficulty is that images from bygone eras reflect systems of thought 
that are different from our own. The ensuing process of translation results in 
a mediated access to the content of the images, the meaning of which can 
only be determined on the basis of knowledge of the contexts in which art- 
works and their creators existed. 

The annotation of constcamer paintings by means of controlled vocabularies 
enables the retrieval of information by expert and non-expert users alike. This in- 
formation is collected by looking, and by identifying what we see. A dataset makes 
it possible to archive a large number of identifications and the relations between 
these identifications, as well as to share them with the scholarly community and 
an interested audience. The constcamer dataset is thus a tool that allows for better 
vision. The act of interpretation, or understanding what a certain representation 
means, is not recorded digitally because - as is wisely inscribed on a piece of 
paper depicted in The Interior of a Picture Gallery with Personifications of Pictura 
and Disegno (ca. 1630) of which the Flemish painter is not known - “aly et alia 
vident” or, “others see it yet otherwise.””” 
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Sytze Van Herck 
Historians as computer users 


Organizing sources with digital asset management 
for a history of computing 


Mass digitization and born-digital sources have changed the work of historians, 
archivists, and museologists. Many historians spend less time in the archives, 
instead photographing or scanning sources on short research visits. In other 
cases archivists and museologists create online repositories to make sources 
more accessible. But what happens to all the photographs and scans research- 
ers collect? This chapter studies how data management and curation influence 
historical research with textual, (audio)visual, and material sources regarding 
the history of the design and use of computing devices. 

Placing computers into a broader context, my research focuses on the soci- 
etal, business, and labor developments that are at the basis of computing devi- 
ces. My PhD dissertation deals with the emergence of new occupations and 
changes in existing labor structures in relation to different computer models. 
Computers have also had an impact on both the workspace and the workflow of 
the user. Lastly, marketing has created an image of the idealized and arche- 
typal user and has influenced gender stereotypes. To illustrate the evolution of 
the design and use of computer devices, my dissertation is based on objects 
and their representation in images, audiovisual sources, texts, and other sour- 
ces such as drawings. 

Since my primary sources were located across Europe and the United States, 
the time I could spend in museums and archives was limited. Rather than com- 
bining source collection and analysis during my research visits, I instead used 
these visits to briefly look at the objects on display, or sources in the archive cat- 
alog, to decide whether or not to digitize. Section 1 therefore looks into the differ- 
ent approaches for selecting sources before, during, and after museum and 
archive visits. Additionally, each section of this chapter discusses one or more of 
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the FAIR Guiding Principles for scientific data management.' FAIR stands for find- 
able, accessible, interoperable, and reusable, and these principles can be used to 
evaluate existing research datasets such as museum and archive catalogs, or as 
selection criteria for research data management tools. Section 1 places a particu- 
lar focus on the first FAIR principles of findable and accessible data. 

After spending a day in the museum or archive, I usually amassed between 
200 and 700 photographs, depending on the number of objects and archival 
sources I selected and photographed. Seven museum and seven archive visits 
have resulted in a total of over 18,000 photographs. In order to remember 
when, where, and what was digitized, the structure and organization of notes, 
photographs, and/or scans was essential. The metadata needed to be rigid 
enough to relocate a source, yet flexible enough to reorganize and recombine 
sources to facilitate analysis. 

Section 2 discusses the digitization process and is followed in Section 3 by 
an analysis of (meta)data structure and documentation practice using the Tropy 
data asset management tool. Section 4 focuses on the influence of the interface 
and algorithms on the analysis stage of research, as well as on the final FAIR 
principles of data being interoperable and reusable. The chapter concludes with 
a reflection on the notion of the “original,” the consequences of curating or se- 
lecting sources, the value of digital asset management, and the human factors 
that influence digital historical research. 


1 Selecting sources 


Since my research covers an array of machines, users, and applications, my dis- 
sertation is composed of several historical case studies. In the first stage I se- 
lected different types of computing devices — such as punch card equipment, a 
mainframe, a minicomputer, a microcomputer, and a personal computer — which 
differed in medium, size, price, and application. After exploratory visits to sev- 
eral computer history museums in the United Kingdom, Germany, and the United 
States, I limited the selection to specific computer manufacturers and models. 
The first case study centers around Remington Rand and Powers-Samas 
equipment for punch cards used in accounting in the 1950s. The second case 
study shifts to the 1960s with IBM’s System/360 mainframe, aimed at small and 
medium-sized companies, while the third study focuses on a computer model 
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from the 1970s: the PDP-11 minicomputer developed by the Digital Equipment 
Company (DEC) and used in the aerospace industry. The final two case studies 
compare a microcomputer from the United Kingdom, the BBC Micro, with the 
Apple He personal computer from the United States, both of which were popu- 
lar in schools in the 1980s.” 

The framework of life cycle studies from material culture allows a compari- 
son of different computing devices based on Henry Glassie’s three main con- 
texts of objects, namely “creation, communication and consumption.”*These 
contexts translate to the following life cycle stages for computers: design and 
manufacturing (creation), sales and marketing (communication), and installa- 
tion and use (consumption). The refined selection of case studies above formed 
the basis of my preparation for archive visits. 

Reviewing the policies of the archive or museum in detail before a visit is 
paramount, in order to check whether scans or photographs are allowed, what 
has already been digitized, and how the collection is organized.* Limiting the 
research question to a small number of case studies helped me select only rele- 
vant objects and sources. Ideally, a museum or archive’s online catalog allowed 
browsing, filtering, and search. Finding aids were especially helpful to get an 
idea of the material an archive offered. In large collections, even specific searches 
sometimes generated too many results, so filtering could refine results further.’ 
Reviewing the temporary and permanent exhibitions in advance also made for a 
more efficient museum visit. 

For both museum objects and archival records, most items are not on dis- 
play but located in storage, either on-site or further away. Furthermore, ar- 
chives have not usually cataloged their entire holdings and certain (sensitive) 
information might even be withheld from the publicly available online catalog. 
The internal catalog of the institution often contains more extensive metadata. 
Metadata is “data about (digital) data or any physical or conceptual object.”° As 
Leonie Hannan and Sarah Longair remind researchers in their research guide, 
“the composition of collections is shaped by a variety of factors including the 
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purpose for its foundation and the original contents; decisions made by collec- 
tors, donors, and curators; the institution’s changing priorities and locality; and 
its resources and size.”” 

As Gerben Zaagsma puts it, “digitisation is about selection” based on a va- 
riety of criteria which are rarely made explicit and certainly influenced by costs 
or funding. At some of the museums, the enthusiasm of volunteers and staff — 
and their preference for certain computer models — together with the availabil- 
ity of sources that provided a glimpse of users from the past, certainly influ- 
enced my selection of case studies. Additionally, my time in archives across the 
world was limited as I had a generous but not unlimited amount of funding — 
and this too impacted my selection.” 

Unlike library catalogs, which often use the same metadata format (MARC, or 
machine-readable cataloging) to develop online public access catalogs (OPACs) - 
and for which global catalogs such as Worldcat allow researchers to search beyond 
a single institution - museum and archive catalogs are rarely searchable through a 
single global catalog due to the diverse nature of items in the collections. Many 
initiatives such as the Conceptual Reference Model of the International Council for 
Documentation (CIDOC-CRM), which is a Linked Open Data ontology to describe 
metadata, in combination with overarching collections like Europeana, attempt to 
standardize metadata across cultural heritage institutions and thus create search 
portals beyond single institutions.’° However, each institution I visited used a dif- 
ferent portal. As Fotis Lazarinis concludes after listing the advantages of OPACs 
over card catalogs, “encoding data using the same format promotes interoperabil- 
ity among digital library tools.” 

Some catalogs accommodated browsing, whereas the interface of others was 
only meant for search. Browsing allows researchers to discover sources they 
were not necessarily looking for and provides an overview of the institution’s col- 
lections and structure. Searching, however, is only useful if you know what you 
are looking for. The Computer History Museum in Mountain View, California 
made archival finding aids available as PDFs describing each collection donated to 
the museum and listing nearly all individual items, thus allowing me to explore its 
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content before my visit.” Another unexpected find occurred while I was browsing 


the Centre for Computing History’s website, where I discovered a single box filled 
with documents from a former Helena Rubinstein employee which provided the 
foundation for an entire case study." 

Helena Rubinstein was one of three influential female business owners in 
the cosmetics industry and was described by Life Magazine in 1941 as belonging 
to the “matriarchy of the beauty industry.”'* The company’s billing departments 
in the United States and United Kingdom used punch card equipment and Ma- 
dame Helena Rubinstein helped to promote this technology in the 1940s and 
1950s. Since both advertisements and company documents have survived, com- 
paring the changing gender composition of the beauty industry and the punch 
card industry brought to light some interesting parallels. 

Other catalogs, such as the OPAC catalog of the Living Computers Museum 
+ Labs, were difficult to locate on the museums’ website and had been designed 
for searching rather than browsing.” Browsing through finding aids respects 
the original order of a collection and is thus particularly useful in the explor- 
atory phase of a study, whereas search functionality facilitates making connec- 
tions across collections, which can aid not only the selection of sources but 
also the analysis of the material. In any case, “the ability to formulate meaning- 
ful queries and an awareness of how these queries might influence the search 
results and thus the analytical outcome is essential.”'° 

Before each archive or museum visit I created a spreadsheet listing the col- 
lection, box, call number, date, and short title of each item of interest to my 
research. Depending on the institution’s policy I either contacted the archivist 
beforehand or when I arrived. The policy of the Computer History Museum, for 
instance, only allowed researchers to consult a maximum of ten boxes per day, 
meaning I had to pare down my original selection. Choosing only those boxes 
with either very important or many items per box, I managed to shorten my list. 
Adding simple tick boxes to my spreadsheet also helped me keep track of the 
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items I selected and photographed, meaning I could just take very short notes 
for each item while I was on-site. 

As for any historical research, the selection of sources depends on the re- 
search question and scope of the project. The sources available in turn influence 
the delimitation of the research question in terms of temporal and geographical 
boundaries and thus refine the topic. For instance, I originally intended to cover 
the entire twentieth century but noticed that the case studies differed substan- 
tially after the 1980s and required more context regarding networks and the ad- 
vent of the World Wide Web. Furthermore, computing devices in the first half of 
the twentieth century revolved around improvements in punch card machines or 
single-purpose installations for defense or research projects, which have already 
been extensively researched and had fewer users. Therefore, the temporal delimi- 
tation of my research question was reduced from the entire twentieth century to 
the period between the 1940s and the 1980s. 


2 Digitizing sources 


Many historians rarely mention the technical setup and equipment they use dur- 
ing archive visits. In this section I intend to make my process more explicit, 
given my interest in the use of technology both as a subject of study and as a 
user myself. I digitized most sources on-site. Due to the large variety of material 
objects, textual, and visual sources, a camera was more appropriate than a porta- 
ble scanner. Although our media center provides compact digital cameras, I pre- 
ferred my own camera for its SR+ setting which “automatically optimizes settings 
to suit the scene” and its TEXT setting that ensures clear images of text or draw- 
ings in print.” The lighting in archive and museum settings and the limited time 
frame of research visits, as well as a lack of professional photography training, 
resulted in acceptable rather than professional digitization. Aside from a camera, 
I took along an SD memory card reader and an external hard drive to back up the 
photographs, a ReMarkable tablet to take notes, a laptop to go over my list of 
items and quickly look up additional information, and chargers for each device, 
just in case. In museums I also used my smartphone to record short videos of 
functioning computers. 


17 “FujiFilm Digital Camera X20 Owner’s Manual,” 32, 34, 38, accessed September 28, 2020, 
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At the Centre for Computing History in Cambridge in the United Kingdom I 
established a useful workflow for documenting a single machine. First, an over- 
view photograph showed the setup of a computer, then I photographed each 
component or peripheral such as the screen, keyboard, or mouse, followed by 
the object description. In some cases, exceptional inscriptions or signs of wear 
were also interesting, such as the names of schools printed on some of the BBC 
Micro computers. Perspective was also important, to convey the dimensions of 
the object. 

Including the folder or item number where possible, at least for the first 
and/or last photograph of the same source helps to distinguish photographs of 
archival sources later on. In addition, respecting the order of a source by photo- 
graphing one page, folder, or object after another saves time in comparing 
notes to photographs. At the end of each day, I copied all the images including 
metadata into the correct folder (one for each museum or archive) on my laptop 
and immediately created a backup on an external hard drive. At first I made the 
mistake of renaming files according to the model of the computer being photo- 
graphed, but quickly realized that this disrupted the order of the photographs. 
Adding tags to files directly, without a clear ontology or structure in place to 
ensure consistency was disorganized — plus the document tags were difficult to 
export. Because of these difficulties I switched to dedicated software to orga- 
nize the photographs. 

Among some especially challenging sources to digitize were slides from the 
1970s in the DEC collection of the Computer History Museum.” Using a light 
box was useful for capturing the image of the slides but obscured the handwrit- 
ten text on the frame. In the worst instances this meant I had trouble identify- 
ing the subject, location, or date of the slide later on. Digitizing the museum 
objects was challenging for two reasons. Firstly, in crowded museums, visitors 
were sometimes accidentally included in the photographs and needed to be 
cropped out later. Secondly, images of reflective surfaces such glass display 
cases or computer screens could not be included if they showed visitors or the 
photographer/researcher. Finally, when the battery of the camera ran low or 
my muscles became sore and the images were not sharp, some text was no lon- 
ger legible and some sources became unusable. 


19 Bo Doub, Kim Hayden, and Sara Chabino Lott, “Guide to the Digital Equipment Corpora- 
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3 Documenting collections 


Choosing a tool or method always depends on what works for the project. Col- 
laborative projects need software that allows each team member to interact 
with documents on a shared server or in a cloud infrastructure, whereas an in- 
dividual researcher dealing with sensitive or copyrighted material certainly 
cannot share files and needs sufficient storage on a single device. For some re- 
searchers, combining images into a single PDF and adding each to a bibliogra- 
phy management tool such as Zotero is the easiest and most efficient solution. 
Others choose to build their own dataset or turn to other digital asset manage- 
ment software.”° 

When selecting a tool or method it is essential to first determine the re- 
quirements, keeping in mind the budget, digital literacy, and time investment. 
Since annotating images is just one step in the research project, it is important 
to consider whether the tool fits the research ecosystem. Criteria for evaluating 
the suitability of a tool include, for example, the workflow, compatibility with 
other research tools, or integration into a website. 

Tropy is one of the first research photo management tools specifically de- 
signed for historians. Digital asset management software helps researchers to 
annotate, structure, and recombine not only text, or photographs, but also 
audio, video, and even 3D renditions of objects. For now, files are stored locally 
which means that neither synchronization between different machines nor col- 
laboration with others is possible. Fortunately, an expansion of the range of 
media that Tropy can import, and the addition of remote access and cloud stor- 
age, are currently under development. The strength of digital asset manage- 
ment tools lies in navigating large photo collections with powerful searching, 
tagging, and annotation features. Although organizing sources, adding meta- 
data, tagging, and transcribing is time consuming, data asset management 
tools definitely save time in the crucial final writing phase. 

In the end, Tropy suited the project best, mostly because the photographs 
remain in their original folders, and only a small tile of the image is uploaded. 
Therefore the software does not duplicate the files, thus saving precious storage 
space. The tool was developed by the Roy Rosenzweig Center for History and 
New Media, which also developed Zotero. A collection in Tropy contains items 
that in turn consist of one or several images. 

I chose to separate photographs into projects based on the subject of each 
of my dissertation chapters. For the first collection in which I tested the tool I 
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browsed all images from the archives and museums I visited and wrote down 
the file names of the images from the original folder that were selected for the 
collection. Because the file path needs to remain the same to display the origi- 
nal image in the item view, I created a designated folder that I used only for 
copies of the selected photographs from the external solid-state drive (SSD) 
while still continuing the file structure according to the name of the institution. 

I changed one crucial step in this selection process for all other collections 
after the first trial. Rather than repeatedly browsing all photographs whenever 
starting a new collection, it proved more efficient to browse everything only 
once and to take note of the file name, the designated collection, and a short 
title, including the date where possible. For this selection process to be ergo- 
nomic I used a larger monitor and on one side of the 24-inch display I browsed 
the images, while on the other side I checked archive notes via my MacBook 
using the monitor’s Picture in Picture (PIP) option. 

After this selection process, there were two potential approaches to upload- 
ing the images to Tropy. Photographs can be added all at once through the im- 
port feature or simply dragging and dropping, and separated or grouped into 
items later on. Alternatively, an item and its metadata can be created first, be- 
fore any photographs are added to a single item. The first method is more effi- 
cient for a large number of items containing only a small number of images, 
such as letters consisting of one or two images, whereas the second method is 
useful for a small number of items with a large number of images attached to 
each item - in particular for user manuals consisting of up to 300 images for a 
single item. 

I created metadata templates using existing ontologies. Metadata are data 
describing data or, in the case of digital sources, data generated by the camera 
such as date and time, as well as a file name, accompanied by data added by 
the researcher including the archive or museum, the identifier, the collection 
number or call number, a title, the folder and box or location, the language, the 
manufacturer or author, the copyright holder, etcetera. Each of the optional 
metadata fields for an item can be selected based on different ontologies. I cre- 
ated a separate metadata template for each institution based on the information 
available in the catalog and any additional research needs. Indicating the cor- 
rect institution (i.e., the museum or archive) and the rights (e.g., educational 
non-commercial use only) determined by the institution’s policy was partially 
automated. 

When datasets adhere to the FAIR principles, they should also be interopera- 
ble. Tim Berners-Lee introduced the idea of a Semantic Web containing Linked 
Open Data in 2008 as “the idea of having data on the Web defined and linked in 
a way that it can be used by machines not just for display purposes, but also for 
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automation, integration, and reuse of data across various applications.”?' The Se- 
mantic Web is based on the Resource Description Framework (RDF) built up of 
triples that contain a subject (e.g., the image DSCF7180), a predicate or property 
(e.g., has the title) and an object or value (e.g., Apple He). 

Each element of a triple can be a Uniform Resource Identifier (URI) which, 
similarly to a file path, identifies a resource on a computer network. For in- 
stance, http://purl.org/dc/elements/1.1/title is the URI referring to the property 
called “title” according to the Dublin Core Metadata Initiative (DCMI). Thus, 
by searching the DCMI’s website for this URI, other researchers or algorithms 
can find the description of the title property. The most common form of URI is a 
URL, such as https://www.dublincore.org/specifications/dublin-core/dcmi- 
terms/#http://purl.org/dc/elements/1.1/title, which links to a web page that de- 
scribes the aforementioned URI. 

An RDF can be expressed in different languages: Tropy uses JSON-LD.” 
The URI is defined by an ontology which describes concepts that can be used 
to build semantic models adding meaning to data which can be reused by 
others. In other words, by adhering to established standards or URIs both fu- 
ture researchers and search algorithms can understand the meaning of meta- 
data. In Tropy’s metadata template builder I could select concepts from an 
ontology such as the DCMI metadata terms mentioned earlier, as well as RDF 
Schema.” In theory I could also incorporate a cultural heritage ontology such 
as CIDOC-CRM but the concepts were often too detailed or complex for my re- 
search dataset.” 

Aside from the metadata based on the catalog and the content of the source, 
I added tags such as image, mainframe, or manual to allow filtering. Filtering in 
turn made it easier to relocate certain types of items or compare items with the 
same tag. The “notes” function was very useful for transcribing any text on the 
image but can also be used to formulate ideas or thoughts and will show up in 
search results. Besides transcribing sources, Tropy makes comparisons between 
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sources from different institutions easier, since tags help users discover links be- 
tween sources. Nevertheless, tags also further decontextualize sources from their 
original order in the archive and from the collection to which the original source 
belonged. Tagging is always a trade-off between few but broad terms, and many 
but narrow terms. 

An advertisement for Remington Rand’s SYNCHRO-MATIC bookkeeping ma- 
chine featuring Helena Rubinstein as a person, a brand, and a user of the punch 
card machinery illustrates the documentation process described above.”° First I 
noted the file name of my photographs of the first and last pages of the 12-page 
advertising leaflet. I selected the photographs in the relevant folder and dragged 
them into Tropy. After merging them into a single item, I applied the Computer 
History Museum Archive metadata template from the drop-down menu. The 
template mimics the online catalog and includes additional information. The 
first photograph included the catalog number, so finding the record online 
was straightforward.” 

Since the title in the catalog referenced the contents of an entire folder 
rather than this particular advertisement, the Tropy item title was taken from 
the front cover of the leaflet. I determined the date based on an example of an 
invoice dated 1940 inside the advertisement. The metadata for the publisher, 
collection, URL, catalog number, dimension, and provenance came from the 
Computer History Museum catalog. The description was based on the content 
of the advertisement and the box number came from my research visit prepara- 
tion notes which were based on the finding aid.” Tropy automatically inserted 
technical metadata and I added four tags to this particular item: applications 
and use, equipment, marketing, and punch card. After a comparison of all items 
tagged marketing, this advertisement became the focus of my Helena Rubin- 
stein punch card case study.” 
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https://www.computerhistory.org/collections/catalog/102683284. 

28 Dale Jenne, “Guide to the David C. Faloon papers,” 2007, Computer History Museum, 
accessed November 12, 2020, https://www.computerhistory.org/collections/catalog/ 
102634421. 
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4 Analyzing sources through the interface 


The key advantage of digital asset management during the analysis phase of 
research lies in the easy navigation between an overview of all sources in the 
project view that can be filtered through tags or sorted based on metadata, and 
the detailed item view where each image or page can be annotated separately 
using notes. When describing the workflow of punch operators and the whole- 
sale department at Helena Rubinstein’s company in London in the second half 
of the 1950s, for example, the software allowed me to quickly switch between a 
transcription of a policy document, an image of several invoices, the corre- 
sponding punch cards, an order form, and personal notes of the manager of the 
machine room. Another example of how the interface facilitates the analysis is 
through the zoom function in the item view. For a visual discourse analysis of 
the Apple Ie advertisement, zooming into and panning over the image of the 
advertisement and the corresponding text allowed me to capture small details 
that would be overlooked without this function. 

However, one issue I mentioned earlier remains unresolved. Greater flexibility 
in navigating the collection results in a loss of the original order of archival sour- 
ces or a violation of the core principles of the respect de l’ordre and the respect des 
fonds in archival science and the historical method.” Gerben Zaagsma states that 
the key issue facing digital history is the loss of context both on “the level of col- 
lections, the use of digital archives or when dealing with information retrieval 
strategies,” and in engaging with and experiencing historical materials.” 

Although switching between items is easy, comparing images by placing 
them next to each other on the screen is not possible in Tropy itself. Within the 
program, users cannot open two images from the same item or two items next 
to each other. As a temporary solution the original files can be opened from the 
folder via the file path and placed next to each other outside of the program 
window. An interesting feature for in-depth discourse analysis is the possibility 
to select a specific part of the photograph which is added underneath and can 
be combined with additional metadata about the selection. This selection fea- 
ture can be useful for separating images from text in the analysis. 

Finally, the last FAIR principle - stating that data should be reusable - is 
certainly supported by Tropy since the metadata adheres to existing ontology. 
Unfortunately, sharing photographs and metadata is limited because of archival 


30 Andreas Fickers, “Update fiir die Hermeneutik. Geschichtswissenschaft auf dem Weg zur 
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and museum policies, and heavily impeded by copyright laws and the European 
Union’s General Data Protection Regulation (GDPR). Or, in the words of Julia 
Damerow and Dirk Wintergriin, “copyright practices often prohibit sharing ac- 
quired data, which significantly hampers attempts to reproduce or build on the 
results of a project.”** These legal restrictions had never hindered my previous 
research as a medievalist, but they prevent me from sharing sources and meta- 
data concerning contemporary history. Due to these restrictions the dataset for 
my project will remain internal to the institution and reusable only when permis- 
sion is granted by the archive or museum and the original copyright holder. 


5 Dissemination 


A final discrepancy to acknowledge is that the published images will differ sub- 
stantially from the photographs taken at the archive. The preparation of photo- 
graphs in photo editing software depends on the medium of publication. For 
online publications the resolution of an image file is usually set to 72 dpi or the 
standard monitor resolution, whereas publishers of printed publications generally 
require a higher resolution of 300 dpi. As DiMarco explains, print requires high 
resolution “because it is a high-fidelity media looked at closely by the viewer,” 
whereas “Web images are viewed on a screen, and at a distance.”** Aside from the 
resolution, photo editing software can improve the lighting, straighten crooked 
photographs, and cut out the part of the image that is relevant. After editing an 
image I usually indicate the archive, call number, original image file name, and 
whether the reworked version is an edit or cutout of the original. 

As Andreas Fickers mentions in “Update fiir die Hermeneutik,” the notion of 
the original becomes problematic in any case because data is, as Lisa Gitelman 
and Virginia Jackson pointed out, never raw but rather the result of a process, as 
I have illustrated here.”* The provenance of a digital source is influenced by the 
format, display, storage, and compatibility of a file. In other words, both the soft- 
ware and the hardware used to store, open, and display the file, influence the 
representation of a digital or digitized source.” 


32 Damerow and Wintergriin, “The Hitchhiker’s Guide,” 520. 

33 John DiMarco, Digital Design for Print and Web: An Introduction to Theory, Principles, and 
Techniques, (Hoboken, NJ: Wiley, 2010), 135. 
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As Matthew Kirschenbaum explains “One can, in a very literal sense, never 
access the ‘same’ electronic file twice, since each and every access constitutes a 
distinct instance of the file that will be addressed and stored in a unique loca- 
tion in computer memory. |... .] Access is thus duplication, duplication is pres- 
ervation, and preservation is creation — and recreation.”*° To further complicate 
matters, Jacob Gaboury explains how “the computer is not a visual medium,” 
but rather “our engagement with computing technology is increasingly medi- 
ated through the interface of the screen.”*” Although we perceive an image on 
the screen as a representation of an original source, the file is saved on an SSD 
in the form of zeros and ones, bits and bytes, or colored pixels. 

To illustrate the discrepancy between what is stored and what is shown, 
without wrecking a hard drive, I opened a PNG file with the text editor. The first 
few letters made some sense (@PNG), but the rest of the file can only be de- 
scribed as a gibberish of letters, numbers, and symbols. What the PNG file con- 
tains when opened with image software is an edited cutout of a Powers-Samas 
tabulator which processed punch cards to produce reports and was taken from 
a user manual.*® 

To avoid a mismatch between the gray background of the image and the 
white pages of my printed dissertation, I deleted the background of the image 
using the magic eraser feature, in combination with the lasso and pencil cutout 
features of PhotoScape X. I also adjusted the perspective to correct the strange 
angle of a picture taken from a book. Finally, the file name of the edited image 
(CCHa_CH28274_DSCF3374_edit-cutout.png) references the archive where the 
image was made (Centre for Computing History archive), the call number of the 
source (CH28274), the file name of the original as assigned by the camera 
(DSCF3374), and the transformations made in PhotoScape X (edit and cut out). 
The name thus ensures that, even if the document is taken out of context when 
uploaded to Overleaf for insertion into my dissertation, I can easily relocate the 
source and remember which changes were made in the photo editing software. 
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6 Conclusion 


Historians increasingly rely on digitized and born-digital sources. But digitizing 
collections is often an expensive undertaking and nearly always a political 
choice which can bias historians’ gravitation toward certain sources over others 
that are not digitized and yet form a much larger part of the overall collection. 
The problems of access to sources and a bias toward particular topics at the ex- 
pense of others also depend on which sources survive both in analog and digi- 
tal form, or how much of a museum’s collection is cataloged, accessible, and 
can be searched, versus what remains hidden. However, for geographically dis- 
persed archival research, “quick and dirty” digitization of sources through pho- 
tography has an impact on the kinds of questions historians can ask. I argue 
that although digital history is primarily concerned with digital tools and meth- 
ods for analysis, data entry — much like archival research - requires careful re- 
flection and should not be taken for granted. 

Aside from selecting relevant sources and accompanying metadata for the 
researcher’s collection, categorization, and the use of established existing on- 
tologies ensure consistency, and assigning metadata in bulk can improve the 
accuracy of the data. As the FAIR Guiding Principles state, “beyond proper col- 
lection, annotation, and archival, data stewardship includes the notion of 
‘long-term care’ of valuable digital assets, with the goal that they should be dis- 
covered and reused for downstream investigations, either alone, or in combina- 
tion with newly generated data "7" 

The notion of the digital dark age in which potentially valuable digital sour- 
ces rapidly disappear into a “digital black hole” reveals several tensions that 
worry historians and archivists. In a constantly expanding mass of digital re- 
cords, and despite the common notion that what appears online “sticks,” data 
and files are easily deleted, or can become damaged, or incompatible with 
ever-changing hardware and software. What is often overlooked is the fact that 
behind most of the data there are people, organizations, and events that impact 
the sustainability of datasets. This holds true for museums and archives as well 
as their dynamic catalogs, and for the data management of individual research- 
ers’ projects and their affiliations with an institution. 

In other words, institutions are never permanent and both collections and 
institutions are subject to change. As soon as stakeholders disappear, so too 
might data after the termination of a project. As researchers or employees of in- 
stitutions move on and lose interest, a lack of transparency can result in the 
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loss of the most valuable asset of research, its sources and data. The loss of 
data becomes even more problematic in research funded by public institutions. 
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Ill Digital experiences and imaginations 
of the past 


Marleen de Kramer 
3D models are easy. Good 3D models 
are not 


What’s the difference between a reconstruction and an artist’s impression? Merely 
that the latter does not imply scientific accuracy. The term “artist’s impression” — 
or the equally vague “visualization” — is often used in an attempt to bypass the 
issues of explaining how a reconstruction was validated, how and by whom design 
decisions were made, and how conflicting theories were reconciled. These pro- 
cesses are time-consuming to document — but very much necessary if the finished 
model is to be a scientific document in its own right,’ as well as one in which “the 
foundations of evidence for the reconstructed elements, and the reasoning around 
them, are made not only explicit and interrogable but also can be updated, ex- 
tended, and reused by other researchers in future work,” 

Furthermore, for many virtual reconstructions, other researchers are not 
the only - or even the primary — users. Rather, reconstructions are often aimed 
at communicating theories about the past to lay users such as museum visitors, 
readers of magazines, users of websites, and school pupils. This means that, to 
be truly useful, a finished model must not only appeal to the viewer and show 
the theory the creator intended, but must also engender a healthy skepticism in 
its viewers, to encourage them to interrogate it and learn about the data that 
went into its creation — and to show that, for all the hyperrealism a model can 
achieve, it remains a theory rather than the truth. 

This paper seeks to give an overview of how I approached this challenge in 
my case study, experimenting with ways to document the reconstruction pro- 
cess, ways to communicate to the user that there are underlying decisions that 
rely on very different types and qualities of data, and a way to test that users 
understand the principles involved. 
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1 Documenting decisions 


Initially, it seemed that the problem of documenting knowledge provenance 
and decisions would be trivial. An interdisciplinary approach suggested that I 
could simply co-opt and reuse a tool from industry, perhaps something used by 
industrial designers, engineers, or in forensics — a tool made to document de- 
pendencies, decision processes, and levels of confidence in those decisions, 
along with a database to store supporting documentation. 

Unfortunately, it turns out that such a tool does not exist. Over the past 
three years, I have approached visiting lecturers and conference speakers, 
friends working in different disciplines, and even written directly to representa- 
tives of various industries, and always received the same answer: that if proj- 
ects are documented at all, this is done in a non-standardized text format. 
Furthermore, the documentation process tends to fall by the wayside relatively 
early on in project timelines, one of the first tasks to be cut if time or finances 
are tight, and often seen as an onerous administrative burden rather than an 
integral part of a project. 

This, of course, fundamentally changed my research question. Developing 
such a tool would have been a PhD thesis in itself — or, more likely, a cross- 
disciplinary project for multiple researchers. Incidentally, the same was true of 
my originally proposed database structure, which the experts I consulted deemed 
“more of a hypercube than a table.” 

Therefore, having to rely primarily on my own knowledge across different 
disciplines, my question shifted from: How can I apply an industry documenta- 
tion method to a humanities project? to: How can the decision-making processes 
and the underlying data involved in a 3D reconstruction be communicated to 
end users, so that the reconstruction is a robust historical resource? 

This stage taught me two important lessons: to never assume that a task 
from another discipline is trivial; and to establish what is feasible, to determine 
scope before defining the question. 


2 The state of the art — documentation 


Many researchers are currently working toward the documentation of cultural 
heritage knowledge. What makes this field especially challenging is its multidi- 
mensionality and the fact that many related data are not text-based and are 
therefore more difficult to annotate, browse, and catalog — requiring semantic 
enrichment to be searchable or machine readable. 
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Two significant initiatives, the CIDOC CRM? and Arches,“ were created spe- 
cifically to address these challenges. They go beyond the recording of sources, 
to show the connections between historic events, objects, people, and places. 

CIDOC is the international documentation standards committee of the Inter- 
national Council of Museums (ICOM), and its object-oriented conceptual refer- 
ence model (CRM) “represents an ‘ontology’ for cultural heritage information 
i.e. it describes in a formal language the explicit and implicit concepts and rela- 
tions relevant to the documentation of cultural heritage.” This framework al- 
lows disparate data and sources to be mapped to a common frame of reference. 

Arches — an open-source software platform for cultural heritage data man- 
agement - is a practical implementation of CIDOC CRM, aimed at helping cul- 
tural heritage institutions and organizations collect and manage their data in a 
common format. It was developed jointly by the Getty Conservation Institute 
and World Monuments Fund and includes an app to make it easy for end users 
to gather data. 

Unfortunately, the latter is not suitable for individual projects unless it is 
supported at the host institution. As the project’s fact sheet cautions, “Arches is 
a powerful enterprise-level platform designed to be used at an organization or 
project level and not as an individual desktop application. As a result, adopters 
will need to identify a server to host the Arches platform and as with any enter- 
prise-level system, should expect to engage the services of an appropriate IT 
professional to set up and maintain it.”° Though it is tempting to suggest this 
could be solved by working together with a computer science researcher, this 
would not be research for them, but a simple implementation issue, making it a 
problem of infrastructure. 

Additionally, while Arches is designed to work with geographic information 
systems (GISs) or maps, it does not include a way to view and browse 3D mod- 
els. This is a task more appropriate to building information modeling (BIM) sys- 
tems, such as Heritage BIM (H-BIM) software for cultural heritage. 

H-BIM is an interesting and challenging field because it inverts the typical 
BIM process. In modern construction planning, architects and engineers can draft 
structures using BIM systems such as Archicad, which break a building’s compo- 
sition down into discrete parametric objects whose qualitative and quantitative 
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metadata are fed directly into a database, allowing the building’s costs, structure, 
building time, etc. to be quickly computed and analyzed. 

A heritage building, however, is “the result of modification and stratifica- 
tion processes carried out over time,” meaning that its existing or past ele- 
ments must be surveyed, analyzed, and then reproduced as-built in a BIM 
environment in order to enrich the resulting model with data, and that the in- 
tangible history of the building must be taken into account. 

While H-BIMs are specific to cultural heritage and are helping to address the 
specific intricacies of modeling existing historic buildings, even an H-BIM system 
combined with a heritage ontology such as CIDOC CRM is insufficient for docu- 
menting 3D reconstructions. Although relationships between data points are 
tracked, their dependencies are not. It is not possible to specify that a conclusion 
is true, or that an element exists only if a previous assumption is true, nor to 
compare different theories, nor to assign probabilities or degrees of accuracy to 
those theories. Furthermore, H-BIMs are not designed to allow for conflicting the- 
ories, or multiple versions of an element, to be contrasted. This means that while 
they are a good choice for tracking metadata, H-BIMs do not yet have capabilities 
that extend to paradata. 


3 The state of the art - communication 


To date, there has been no convention for communicating the data behind a 
model, in the way that bibliographical references are used to convey sources 
for text. While most 3D reconstructions are captioned for dissemination, as 2D 
images would be, their captions usually only extend to basic data such as what 
they represent and who their creators were, but give no information on the cre- 
ation process, or the metadata or paradata, nor an indication of which part of 
the model they apply to. 

Interactive 3D models are closing this information gap, as they allow differ- 
ent types of data to be displayed on the same model - for example, through the 
use of different textures, or by annotating certain elements. End users are be- 
coming more familiar with this mechanism through the increasing ubiquity and 
accessibility of 3D content, especially in gaming. 
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Furthermore, virtual tours of museums and heritage sites are springing up 
across the internet, many no longer even requiring specialist apps. In some 
cases, they communicate their metadata through an audio track, much like a mu- 
seum guide, while others use text, either embedded or as accompanying pop- 
ups. These digital offerings sometimes include virtual reconstructions, allowing 
users to switch between a view of the current physical site and a historical visual- 
ization, or show different types of content on different parts of the website. 

One notable example of this is the MayaArch3D project,’ which documents 
historic Copan, Honduras, in a way that is scientifically rigorous yet accessible 
to the general public. Unfortunately, like many such initiatives, it does not de- 
liver the smooth, polished graphics that users have come to expect from video 
games. As many heritage sites and institutions do not have the budget to de- 
liver this technological state of the art, less ambitious solutions that do not at- 
tempt photo-realism are a reasonable compromise. 


4 Visualizing certainty 


Creating a new system entirely from scratch as an interdisciplinary but still solo 
researcher, without technical support or outside expertise for its implementa- 
tion, was clearly outside the scope of PhD work. I therefore decided to focus on 
creating a 3D reconstruction as a case study, to show the feasibility of visualiz- 
ing uncertainty in architectural reconstructions, using the existing tools of text 
and tables to document the process. 

To this end, I designed a two-dimensional matrix (Fig. 1). The horizontal 
axis has four categories, in decreasing order of certainty or estimated accuracy: 
relict, interpolated, extrapolated, and speculative. These categories are inten- 
tionally broad and were chosen to be reasonably self-explanatory. 

“Relict” covers elements for which evidence survives from the time of their 
creation. “Interpolated” refers to consulting several nearby data points, e.g. fill- 
ing a gap in a wall along an existing foundation. Whereas this interpolated result 
is a line between two points, an “extrapolated” result is a vector, using a solid 
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Fig. 1: A visualization of the four degrees of accuracy. 2020. © Marleen de Kramer. 


point of reference augmented with secondary and tertiary sources. “Speculative” 
results are obtained using only secondary and tertiary sources, e.g. making com- 
parisons with similar sites or using engineering knowledge to estimate heights 
of walls. 

The second dimension in the matrix is level of detail (LoD) (Fig. 2). At a low 
LoD, the degree of accuracy may, conversely, be very high - with the location 
and dimensions of a building already being known based on ruins, historic 
maps, etc.” - while at a very high LoD, accuracy may be low and all conclusions 
speculative, with elements such as details of individual rooms having no evidential 
basis, such as when there is no trace remaining of the original wall coverings. 

This system of classifying accuracy relies on a segmented model whose gran- 
ularity increases with its LoD, so visual differentiation between levels of accuracy 
can be achieved through parameters like texture, transparency, or line weight, 
controlled by attribute tables attached to the segments. They can be adapted dy- 
namically, e.g. displaying anything with an accuracy of “interpolated” or better 
at a medium LoD. As new data are discovered or new conclusions reached, indi- 
vidual segments can be updated or their classification changed without invalid- 
ating the model entirely. 


9 LiDAR data, ground-penetrating radar, historic photographs, or other depictions or 
descriptions. 
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5 Finding sources 


The first step in creating a reconstruction was to pick a case study site, with the 
caveat that it should be a medieval castle in Luxembourg. Vianden Castle is the 
best-known, best-researched and one of the best-preserved such castles, so 
seemed the obvious choice - but a digital reconstruction was recently produced 
by a commercial archaeology consultant, so another reconstruction that repli- 
cated previous work would add little new knowledge to the history of Luxem- 
bourg’s castles and little value to the site. Instead, I settled on Larochette 
Castle, which is only partially rebuilt, but is reasonably well documented, and 
has some primary sources available in the form of written documents. 

The next step was to gather and investigate various primary and secondary 
sources, from John Zimmer’s seminal work on the castles of Luxembourg, “Die 
Burgen des Luxemburger Landes,” along with contemporary documents and 
contracts, to historic maps and images that, while not contemporary, show a 
pre-industrial view of the town. I quickly decided to reconstruct not just the 
castle but the surrounding town, as the economic and defensive structures of 
both work together, and so they cannot be viewed in isolation. 

There were several barriers to overcome in this step. First, certain docu- 
ments were not available - documents of whose existence I knew, but which 
were contained in archives I could not access, like the archaeological records of 


10 John Zimmer, Die Burgen des Luxemburger Landes, vol. 1 (Luxembourg: Imprimerie Saint- 
Paul, 1996). 
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the castle excavation. Second, Luxembourg’s multilingual nature meant that 
searching for “Larochette” was not enough: the castle and town are referred to 
in German and Luxembourgish as “Fels” and “Fiels,” respectively, and there 
are also historical variants like “Veltz.” Third, the name Larochette, meaning 
“the rock” in French, is far from unique, and simply adding “Luxembourg” to a 
search string does not achieve the desired results, as the region’s complex his- 
tory means that the town and castle belonged variously to other countries or 
now-defunct kingdoms. Finally, the search had to cover not just text sources, 
most of which are now machine readable and can be found via the usual search 
tools, but, more importantly, images. 

Finding, analyzing, and using images is inherently more difficult than 
working with text. Unlike text, which only loses a small amount of metadata 
when digitized, images are very sensitive to scanning and reproduction meth- 
ods, losing resolution, experiencing color skewing, and with excerpts being dif- 
ficult to trace back to their original contexts. There is also no single established 
way to cite them, and usage rights for publications are more difficult to obtain. 


6 Interdisciplinary warm-up exercises 


While the format of our individual PhD projects left little time for joint projects, 
I was able to team up with two researchers from other disciplines within C*7DH 
for a small exercise designed to benefit each of our research projects, and 
which resulted in joint conference submissions. Sam Mersch is a linguist work- 
ing on microtoponyms in the Luxembourgish language, while Christopher 
Morse is studying virtual reality for cultural heritage institutions from a human- 
computer interaction perspective. 

Together, we reconstructed the landscape around Larochette Castle, using 
microtoponyms found in historical maps and other sources to pinpoint the loca- 
tion of landmarks such as churches and mills, and to determine which areas 
were under cultivation historically." We then translated our hypothetical map 
into a terrain model, as well as a reconstruction of one of the castle’s rooms, in 


11 Marleen de Kramer, Sam Mersch, and Christopher Morse, “Reconstructing the Historic Land- 
scape of Larochette, Luxembourg,” in Digital Heritage. Progress in Cultural Heritage: Documenta- 
tion, Preservation, and Protection, EuroMed 2018, Lecture Notes in Computer Science, vol. 11197, 
ed. Marinos Ioannides et al. (Cham: Springer, 2018). 
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order to create an environment for an educational game in virtual reality.” In 
this game, users combine elements from toponyms and place them on a map to 
unlock landscape elements in the terrain below the castle. 

While an alpha prototype was tested at both a conference and a public- 
facing event, the game could not be fully developed in the limited time we had. 
Nevertheless, it was a valuable insight into interdisciplinary work, demonstrat- 
ing how each of our different disciplines approaches research, which tools we 
use, and, particularly, which steps were not obvious to the other researchers as 
being time-consuming. It also involved some negotiating between us of a joint 
vocabulary, which became an important tool when communicating with the 
public, as we identified and defined some terms that would otherwise have 
been ambiguous to them. 


7 Reconstruction: Buildings in the town 


After this exercise had provided a solid hypothesis for the historical landscape 
surrounding the town, the next step was to reconstruct the town itself. The 
structures of buildings themselves can be fairly changeable over the centuries 
and indeed, the town of Larochette suffered several devastating fires after 
which dozens of houses had to be rebuilt. However, other elements, including 
building footprints, roads, and property boundaries, are much more resistant to 
change and can serve as the basis for a reasonable attempt to depict the town’s 
historical structure. The most significant of these changes occurred in the late 
nineteenth and early twentieth centuries, in the course of industrialization, so 
any earlier descriptions and depictions are likely to contain relicts of the town 
as it was circa 1550, the time period chosen for the reconstruction because the 
castle was then at its full extent and had not yet burned down. 

The town’s basic structure is determined by its geography. Situated at the 
junction of two valleys, the town is bounded by high cliffs to the northeast and 
southwest, with the castle on the rocky promontory to the southeast that gives 
the town its name. Historically, these two natural ‘walls’ were supplemented by 
two high, man-made walls - one of which closed off the southern end of the 
valley and the other which ran straight across it to the north. The other valley, 
running almost perpendicular to the first, contains a wide, flat floodplain 


12 Christopher Morse and Marleen de Kramer, “What’s in a Name: Gamifying the Intangible 
History of Larochette, Luxembourg”, 23rd International Conference on Cultural Heritage and 
New Technologies, Vienna, Austria, 2018. 
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around the White Ernz River. This river was straightened and largely buried, 
and the two roads in the valley were merged into one, in the late nineteenth 
century to make room for a narrow-gauge railway that terminated in Laroch- 
ette; but an approximation of its historical course can be found on the cadastral 
map that depicts the area before these changes. 

In his book, John Zimmer had already speculated on the general layout of 
the town when the walls were still in place, based on historical maps (though the 
book introduced some errors through oversimplification), so this formed the 
basis of my town reconstruction. The footprints of the main buildings were kept 
and extruded into simple houses that maintained the historical and, largely, con- 
temporary character of the town: two-story buildings with pitched roofs and 
single-story annexes. The infill of outbuildings was reduced in a speculative but 
logical progression and these structures were kept relatively low in profile. Con- 
trary to my first assumption that many of these would have single-pitched roofs, 
research revealed that double-pitched roofs were more likely, as shown in con- 
temporary paintings from the Greater Region - largely nativity scenes, which 
often show the shapes of barns and stables that were familiar to the artists. Major 
civic buildings, such as the church, were modeled somewhat more precisely, and 
negative spaces were broken up by larger empty spaces surrounded by low gar- 
den walls, which can still be seen today. 

These models were deliberately kept purely volumetric and with a low LoD. 
Where buildings were textured, simple textures with low detail were chosen - 
cream-colored rendered walls, red sandstone foundation walls and lintels, tiled 
roofs — to reflect the typical character of the region. The town wall and its tow- 
ers were partially extrapolated from their remains, and partially guided by John 
Zimmer’s drawings and other historical views of the town. 

The area outside the town walls to the south housed what was (and still is) 
the commercial area of the town - presumably the floodplain accommodated a 
green where larger markets were held. This area is speculatively surrounded 
by smaller buildings, including some early industrial ones — the town had mul- 
tiple mills and tanneries at this time, as indicated by historical tax records and 
other reports, and located using microtoponyms and the eighteenth-century 
Ferraris Atlas. 


8 Reconstruction: Reconstructing the castle 


While the castle and the town function together, they are distinct entities, with 
different sources available. The major primary source here is, of course, the 


3D models are easy. Good 3D models are not — 249 


castle itself. The ruins are run as a tourist attraction today; they have been ex- 
cavated, secured against further decay and supplemented with modern utility 
buildings, but otherwise remain largely unaltered, apart from Créhange Manor, 
the castle’s best-preserved residential building, which has been physically 
reconstructed. 

Fortunately, there is an excellent source for determining which building 
parts are authentic: when the Luxembourg state acquired this and other castles 
for the nation in the late 1970s, John Zimmer, a building surveyor specializing 
in heritage, was tasked with documenting these acquisitions. He produced 
stone-by-stone drawings that showed the extent and condition of the buildings 
before any interventions took place and which were reprinted in his books on 
the castle. This data was supplemented by nineteenth and early twentieth cen- 
tury views of the town collected from archives and from the local history maga- 
zine, Les Cahiers Luxembourgeois, which dedicated a double issue to Larochette 
in 1938. While these views are largely artists’ renditions whose goal was not ob- 
jectivity, they are similar enough to give an impression of the state of the castle 
between its destruction and its restoration, especially the view of the southeast- 
ern facade from the hill opposite — this perspective was popular with painters. 

Leaving aside the physical reconstruction of Créhange, there have been 
three known attempts to visually reconstruct the castle. The first was a series of 
illustrations dating from around 1900, by Jean-Pierre Koenig, an architect and a 
member of the “Friends of the Castle” society. These drawings are quite fanci- 
ful, and reflect the contemporary historicized fashion of the day much more 
than any serious scientific work. Most notable among its inaccuracies are a tall 
tower above what should be a chapel in the Homburg Manor and the castle’s 
baffling indefensibility: while the walls, as drawn, have loopholes, they lack 
battlements — and the postern gate has external hinges that would be very easy 
to breach. This reconstruction also sprouts fanciful turrets and stepped gables 
for which there is no evidence in the castle ruins. 

The second reconstruction took a more scientific approach, being a draw- 
ing by John Zimmer based on his precise castle survey. Unlike the town map 
that he proposed, this reconstruction seems to account for all known castle 
parts and to incorporate some of the archaeological evidence. However, Zimmer 
failed to explain his reasoning or expound on his theories, so while the recon- 
struction may be sound, it is not supported by data. 

Finally, there is a model of the castle in the attic of Créhange Manor, recog- 
nizable as such only by association. Strangely, this model includes no metadata 
whatsoever, not even as to which castle it is supposed to represent — we do not 
learn who built it, when, or which era it presumes to show, so it cannot func- 
tion as a scientific document in any sense. 
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As my self-imposed task was not “a complete and accurate reconstruction 
of Larochette Castle,” but rather “enriching a model with paradata to allow fu- 
ture researchers to refine it,” my model as drawn is largely based, again, on 
John Zimmer’s reconstruction, though validated where possible by using other 
available data. 


9 3D model 


My volumetric model was drawn largely in Autodesk 3ds Max, which is a 
modeling and animation tool rather than a computer-aided design (CAD) or 
BIM system. This means that it lacks the accuracy of CAD - although this is not 
needed given the inaccuracies of the data underlying it - and does not have the 
native link to a database that would be provided by a BIM system. To create 
links to data more easily, the individual elements could, in theory, have been 
exported from Autodesk to a BIM system and assembled there. However, they 
would need to be transformed back to Autodesk anyway to be used in the web- 
viewable 3D environment that was my goal, and this step was not necessary. 

Instead, similar objects were transformed into instances, so that changes to 
one would propagate throughout the model easily, grouped by type. Each class 
of object was given a distinct name and then numbered to provide unique iden- 
tifiers. This was slightly complicated by the fact that the segmentation of the 
objects could vary by attribute; for example, the town wall might have pieces 
with different accuracies along its length whose extent did not match up with 
the different sources, so objects had to be broken down into sub-objects to 
allow this distinction. 

Finally, as the model was primarily designed to give a visual impression, 
the objects had to be adjusted to sit in the terrain without any floating, so they 
were extended under ground level to achieve this — therefore, the heights of the 
building objects were no longer the same as the heights of the buildings when 
viewed in isolation. The details of the terrain itself were taken from the cadas- 
tral office’s 5 m digital terrain model and first interpolated using Terragen” to 
smooth the hard edges, then simplified so that the polygon count was reduced 
outside the immediate town area. While this introduced some fuzziness regard- 
ing the accuracy of the terrain, it made for a more realistic visual statement, as 
it resulted in shapes that were more organic. 


13 A program by Planetside Software for creating photo-realistic scenery and terrain models. 
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In dealing with the difficulty of documenting metadataintegrity and para- 
data without access to a database system, the unique identifiers allowed me to 
use an attribute table that contained, for each object, its classes and types, de- 
gree of accuracy, scale/level of detail, and so forth: a simple but structured way 
to store data. To avoid long texts in a table - and much error-prone copying 
and pasting — the argumentation, sources, etc. can be provided in documents 
for groups of objects, rather than individually. A true database would have sim- 
plified this process, but was outside the scope of the project. 

As presented to the user, the model has three modes. The first, a neutral 
mode, allows the user to explore the model as a whole, i.e. a realistic view of the 
whole reconstruction (Fig. 3). The second mode color-codes the buildings by 
function and type and, when an object (such as a building feature or part) is se- 
lected, provides short texts about its purpose and history (Fig. 4). The third mode 
switches to a different color ramp that reveals the certainty according to my pre- 
viously described four-step system of degrees of confidence (Fig. 5). Selecting 
buildings in this third mode gives the user a brief description of the sources un- 
derlying their reconstruction, as well as any particularly relevant argumentation. 


10 Going public 


Documenting the model was not enough, though - it also needed to communi- 
cate to users that there are decision-making processes behind the 3D recon- 
structions, and that these models are not “the truth” but a fairly accurate 
theory. This stage of the project moved beyond architectural history and 3D 
modeling into the realms of digital storytelling, educational psychology, and 
website design. It could be implemented in various ways, from creating a spe- 
cial exhibit at the site itself to implementing a fully digital display. 

Initially, my intention was to run special group projects with local school 
pupils aged around 12-14 years old - the age at which they are learning about 
castles and the Middle Ages. Source criticism, especially of digital resources, is 
an increasingly important skill for students to learn, and one that can be taught 
in many disciplinary contexts - in this case, as part of a history lesson. 

For this approach, the students would explore a 3D model of the castle that 
showed both metadata and paradata, then engage in a group activity to test 
their understanding of the data types and whether they could distinguish be- 
tween the metadata (what we know) and the paradata (how we know it) by pro- 
ducing their own. 
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Unfortunately, the necessary response to the Covid-19 pandemic prevented 
this approach from being implemented, with pupils needing to be taught re- 
motely and then with no time for special projects in the new school year as the 
children caught up on material from the core curriculum. Instead, the focus 
shifted to using a questionnaire. Originally intended as a minor part of a history 
lesson, to see whether understanding how reconstructions work influenced pu- 
pils’ ability to understand the model and its data, the original questionnaire was 
extended, refined (in the context of the user experience design course offered by 
the University of Luxembourg’s psychology department) and translated into ver- 
sions for adults and school pupils, in German as well as English, then dissemi- 
nated both via social media and directly through schools and institutions. 

The questionnaire takes users through a reconstruction process, then seeks 
to gauge their critical understanding of existing reconstructions. Initially, users 
are shown a view of the modern-day town of Larochette and a 3D model of the 
same view in the sixteenth century. One building, the gatehouse, has been 
omitted. Users are tasked with choosing one of six possible versions of it to fit 
the gap shown in the model — and to indicate how they made their choice and 
how confident they are that it is correct. 

They are then shown a series of historical images of a building and asked 
to make a choice between two options for each image set — each of which per- 
tains to a different aspect of the building in question - in order to guide them 
through reconstructing the building from original sources. They are then again 
asked about their confidence in the result. 

Next, users look at historical images for another building and are asked to 
make choices about those, but are presented with more options and allowed to 
suggest their own solutions. And finally, they evaluate existing reconstructions, 
answering questions that are carefully phrased to avoid indicating which as- 
pects are considered most important. 

With over 400 participants at the time of writing, this survey’ will yield in- 
teresting results on its own, but also serve to improve the model itself, and the 
website on which it is to be presented. Preliminary analysis shows fascinating 
differences in users’ confidence in their own choices - from “I don’t have enough 
information to say anything definite” to “The historical sources must be wrong 
because they don’t match my initial theory” — as well as in their willingness to 
trust authority figures even without being given evidence, and in their ability to 
deal with uncertainty, which some think is fascinating but many find frustrating. 


14 The distribution page for the survey is temporarily hosted at https://wordpress-111824- 
1160269.cloudwaysapps.com/index.php/larochette-quiz-en/, accessed September 3, 2021. 
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11 Conclusions 


The inclusion of metadata and paradata in 3D reconstructions involves two sep- 
arate problem areas: gathering and recording the data, and displaying and 
communicating them. While standard data formats for metadata are already 
being used and developed, they are still highly complex and require specialized 
work environments — and no such standards yet exist for paradata. 

While most end users understand footnotes and image captions, 3D models 
do not yet have any such established conventions. Therefore, the problem is 
not just which format of model display to use, but how to communicate to 
users that there is something they should be looking for: that there is further 
information to be found within the model and why it is important. 

This is also true in an interdisciplinary environment, where two of the main 
difficulties in working together are establishing a common vocabulary and de- 
ciding on ways to communicate. 

Perhaps some problems have not yet been solved not because they are not 
interesting, but because they are more difficult than expected. 
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Jakub Bronec 
Walking through the process 


Teaching Jewish history in Luxembourg with the help 
of digital tools 


1 Introduction 


Continually improving technology, and the expansion of virtual educational 
options worldwide, mean that tutors need to develop their understanding of 
new pedagogical approaches that are effective for teaching in a digital environ- 
ment. As McBride et al. have put it, in the context of teaching controversial 
topics such as the Holocaust: “This [digital teaching] phenomenon has ushered 
in a new era of education thus bringing forth a myriad of new questions and 
issues that must be addressed.” 

This chapter looks into new challenges that have arisen in the teaching of 
Jewish post-Holocaust history in the past decade, especially in the fields of digi- 
tal storytelling and oral history. Angelyn Balodimas-Bartolomei judged that stu- 
dents of all ages need to experience new educational methods and a new level 
of involvement as regards the teaching of modern Jewish history.” It is generally 
accepted that teachers can no longer satisfy students’ desire to learn by using 
frontal teaching alone; they have to actively involve their students in the teach- 
ing activities.* 


1 Holly McBride, Brandon Haas, and Michael Berson, “Teaching the Holocaust at a Distance: 
Reflections from the Field,” Ohio Social Studies Review 51, no. 1 (2014): 18. 

2 Angelyn Balodimas-Bartolomei, “Political and Pedagogical Dimensions in Holocaust Educa- 
tion: Teacher Seminars and Staff Development in Greece,” Diaspora, Indigenous, and Minority 
Education 10, no. 4 (2016): 242-54. See also Alina Bothe, Die Geschichte der Shoah im virtuellen 
Raum (Berlin: De Gruyter, 2018). 

3 Frontal teaching (teacher-centered instruction, typically led from the front of a classroom) is 
still predominant, even though this traditional form of teaching is not adequate and does not 
promote the intellectual and emotional involvement of students. See: Snježana Nevia Mo£inic, 
“Active Teaching Strategies in Higher Education,” Metodicki Obzori/Methodological Horizons 
7, no. 2 (2012): 97-105. 
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One of various possibilities for motivating students is to use real-time edu- 
cational applications that stimulate the imagination and involve not just mem- 
ory but also other aspects of the mind such as empathy and critical thinking. 
Keith Barton and Alan McCully found in their study that students prefer courses 
that emphasize interactive and cooperative projects — and that the history class- 
room seems a natural venue for such projects.“ It is also important for the 
teacher to create awareness among students regarding historical debates, since 
current policy debates are invariably rooted in history. 

This chapter will introduce two different applications: the IWalk interactive 
educational app, and MAXQDA, a tool for qualitative data analysis. I used them 
both during the video content analysis course that I ran for bachelor students at 
the University of Luxembourg. MAXQDA served as a gatekeeping tool to help 
shortlist video testimonies for the real-time [Walk application, from the large num- 
ber now available. I organized this course as part of the research within my PhD 
project based on the cultural and educational history of the Jewish postwar popu- 
lations in Luxembourg and Czechoslovakia. I intend to elaborate on the outcome 
of the course, via students’ reflections, in my PhD chapter entitled “Teaching of 
Holocaust and public Jewish history in the postwar states of Czechoslovakia and 
Luxembourg,” in which I will critically analyze a practical output of this course, 
together with the collected data used in both the aforementioned applications. 
The aim of that chapter is to evaluate and compare development of teaching Jew- 
ish public history in Luxembourg and Czechoslovakia. 


2 The concept of IWalk 


The IWalk application was developed by USC Shoah Foundation — The Institute 
for Visual History and Education. This interactive educational app connects spe- 
cific physical locations with memories of historical events that took place in those 
locations. It is currently focused on locations in Europe and the US. The USC 
Shoah Foundation is not a standard scientific institution or cultural foundation. It 
oscillates between different poles and it is difficult to grasp its institutional form. 
Nowadays, it has a large number of branch offices and contract holders (associ- 
ated cultural and historical associations) scattered all over the world. 

After months of beta-testing with educators around the globe, USC Shoah 
Foundation launched a brand new version of the IWalk app in 2019, offering 29 


4 Keith Barton and Alan McCully, “Teaching Controversial Issues . . . Where Controversial Is- 
sues Really Matter,” Teaching History 1, no. 127 (2007): 13. 


Walking through the process —— 261 


IWalks in seven countries and eight languages. Visitors and students can dis- 
cover curated multimedia tours that connect specific historical sites, and loca- 
tions of memory and memorialization, with testimonies from survivors and 
witnesses of genocides, violence, and mass atrocities. The application no longer 
contains only testimonies from Holocaust victims — users can now work with 
genocide stories from all over the world.’ [Walk contextualizes and humanizes 
the history of sites of memory by using testimonies, photographs, and maps, 
thus enabling users to experience the past for themselves.° The result is a 
unique multimedia experience that provides users with a personalized learning 
experience at sites of memory around the world, in multiple languages. 

People walking through the locations of the different tours can use smart- 
phones, tablets or computers to watch video clips of survivors and witnesses 
telling personal stories about the role of these locations in their experiences. 
Students create these clips on an online educational platform called IWitness 
using a free and intuitive video editor. As I’ve explained previously in a blog- 
post, in order to make meaningful video tracks for [Walk tours, these clips are 
usually extracted from full-length video testimonies that are in USC Shoah 
Foundation’s Visual History Archive. And the testimonies, along with photos, 
documents, maps, and other primary sources that can be displayed on users’ 
devices, tell a story that connects past events to present locations in a way that 
underlines the gravity and reality of what occurred there.’ 


3 The course: Empowering the students 
to engage with local history 
Ian Davies judges that “[p]erhaps the most difficult of the pedagogical issues re- 


lates to the choice that teachers have to make in deciding how to present the Ho- 
locaust and what sort of educational aims are valid,”® and Totten and Feinberg 


5 IWalk testimonies now include genocide stories from multiple countries, including Rwanda, 
Armenia, and Sudan. 

6 Regarding the humanization of history, see James K. Bidwell, “Humanize your Classroom 
with the History of Mathematics,” Mathematics Teacher 86, no. 6 (1993): 461-4. 

7 Jakub Bronec, “IWalk: Mapping Jewish Life with your Mobile - New Ways of Teaching Jewish 
History in Luxembourg,” Digital History & Hermeneutics blog, March 9, 2020, accessed July 16, 
2020, https://dhh.uni.lu/2020/03/09/iwalk-mapping-jewish-life-with-your-mobile-new-ways-of- 
teaching-jewish-history-in-luxembourg/. 

8 Ian Davies, Teaching the Holocaust: Educational Dimensions, Principles and Practice (Lon- 
don: Continuum, 2000), 5. 
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argue that teachers should avoid overusing visual material and trying to explain 
the Holocaust too simply.” Barton and McCully found that “there is empirical 
evidence that [engaging students in controversial issues discussion] can suc- 
ceed and that classroom discussions, in which several sides of an issue are 
explored and in which students feel comfortable expressing their views, are 
associated with a range of positive outcomes.”!° Based on their research, it 
appears important to motivate students to analyze problems from the past, 
such as the Holocaust, as well as to critically evaluate different opinions in 
classroom discussion. These ideas were later supplemented by Huei-Tse Hou 
and Sheng-Yi Wu, in whose opinion “teachers can lead discussions in two for- 
mats: synchronous or asynchronous. Typically, asynchronous forums are the 
most widely utilized format because students have more time to respond to 
discussion topics, which leads to deeper levels of thinking.” 

Working in the academic sphere, I have actively participated in a large 
number of courses addressing the issue of perceived conflict in society, cover- 
ing issues such as anti-Semitism, xenophobia, or homophobia. Timothy Peace 
demonstrated from sociological statistics that hatred and animosity toward all 
kinds of minorities had increased significantly in France, for example, espe- 
cially among young people.” Regarding these data, the question is: How can 
we foster awareness of history among young people? The purpose of the [Walk 
project in Luxembourg is to create interactive and educational online historical 
tours and provide rich content for online applications — primarily involving uni- 
versity students, but also secondary school pupils — based on the principles of 
open science. It aims to motivate students to become active content producers 
and not mere consumers. I organized a semester-long course for university stu- 
dents that set out to create an experimental working environment that would 
provide them with the freedom to be creative. The idea was that the tutor 
would assume the role of mediator by helping the students to interact with the 
materials and derive their own conclusions. 


9 Samuel Totten and Stephen Feinberg, “Teaching about the Holocaust: Issues of Rationale, 
Content, Methodology, and Resources,” Social Education 59, no. 6 (1995): 323-33. 

10 Barton and McCully, “Teaching Controversial Issues,” 13. 

11 Barton and McCully, “Teaching Controversial Issues,” 13-9. 

12 Huei-Tse Hou and Sheng-Yi Wu, “Analyzing the Social Knowledge Construction Behavioral 
Patterns of an Online Synchronous Collaborative Discussion Instructional Activity Using an In- 
stant Messaging Tool: A Case Study,” Computers & Education 57, no. 2 (2011): 1459-68. 

13 Timothy Peace, “Un antisémitisme nouveau? The Debate about a ‘New Antisemitism’ in 
France,” Patterns of Prejudice 43, no. 2 (2009): 103-21. 
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When it comes to controversial topics, there are several advantages to hav- 
ing students engage in collaborative projects, but it is important to create deep 
and meaningful instructional structures and to foster abilities, skills, and ap- 
proaches based on content-specific critical enquiry. How can a teacher create 
this environment? According to David Pace, teachers should begin a project 
with sufficient objective background information on the controversial issue or 
topic, offer multiple perspectives, model how to address controversial issues, 
contextualize the issue to ensure student comprehension, allow students to prac- 
tice discussing similar controversial issues prior to the planned topic, and pro- 
vide ground rules for the class discussion.“ 


4 Methodology of the video content analysis 
course and digital source criticism 


The aim of my video content analysis course was to help students apply new 
educational methods in order to facilitate learning about Jewish society in Lux- 
embourg. In the winter 2019/2020 semester, I taught bachelor students at the 
University of Luxembourg to use the [Walk app as an example of using archival 
sources in the digital era. During the individual sessions, participants used the 
intuitive video editing software on the online IWitness/IWalk platforms. The 
course also provided students with an overview of how to use and edit histori- 
cal photos, maps and personal documents as archival sources." 

There was an emphasis on remembering local history and, as part of this 
effort, the course featured a tour of sites in Esch-sur-Alzette and Luxembourg 
City which were relevant to the German occupation of Luxembourg and the 
atrocities committed by the Nazi regime toward local communities. The Jewish 
community was strongly antagonized by Nazi propagandists and, along with 
Roma people, homosexuals, and political detractors of the occupying regime, 
faced heavy persecution.'® Based on the training I received at the Zachor Insti- 
tute of Social Remembrance in Budapest, I guided the students in following an 


14 David Pace, “Controlled Fission: Teaching Supercharged Subjects,” College Teaching 51, 
no. 2 (2003): 42-5. 

15 Bronec, “IWalk: Mapping Jewish Life.” 

16 For further reading see Vincent Artuso, La Question Juive au Luxembourg | L’état luxem- 
bourgeois face aux persécutions antisémites nazies (Luxembourg: Editions forum Luxembourg, 
2015); and Laurent Moyse, Du rejet à l’integration: histoire des Juifs du Luxembourg des origines 
à nos jours (Luxembourg: Editions Saint-Paul, 2011). 
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educational methodology based on the four Cs of consider, collect, construct, 
and communicate.” 

The students were divided into two working groups, with one group tasked 
with designing a virtual tour of Luxembourg City and the other group designing 
one for Esch-sur-Alzette. The participants took pictures of current buildings 
and locations associated with Jewish war history and compared them with orig- 
inal historical photos taken before and during the war. The students were en- 
couraged to reflect on how the appearance and function of certain buildings 
had changed over time.'® 

The theme of an IWalk should be specific enough to meet a learning out- 
come, but the content of the [Walks in Luxembourg and Esch-sur-Alzette is 
rather varied. The students could choose from a wide range of themes such as 
civil resistance (a topical theme), historical events (chronological), memorials 
(spatial) and Jewish traditions (historical). To maintain a clear focus, the video 
testimonies associated with the theme of each IWalk were limited to three mi- 
nutes. The students were advised to choose different interviewees in order to 
highlight different personal perspectives, and they created a short biography 
for each clip, giving details of the interviewee’s life before, during, and after 
the Second World War.'? 

To successfully complete the course, the students had to consider different 
examples of perceived conflict throughout history,” different definitions of intol- 
erance, and selected clips of testimonies related to a specific type of resentment 
(e.g. religious, political, etc.), and construct a video essay about a specific inci- 
dent. Last but not least, the students communicated their reflections to their 
peers via video essays. They were then given an opportunity to use technology to 


17 The four Cs in more detail are: consider background information that draws on previous 
knowledge; collect information from testimonies, biographies, and physical locations; con- 
struct — analyze and evaluate information and one’s own reflection; communicate — discuss 
that reflection. For further reading see Andrea Szonyi and Kori Street, “Videotaped Testimo- 
nies of Victims of National Socialism in Educational Programs: The Example of USC Shoah 
Foundation’s Online Platform IWitness,” in Interactions: Explorations Of Good Practice in Edu- 
cational Work with Video Testimonies of Victims of National Socialism, ed. Werner Dreier, An- 
gelika Laumer, and Moritz Wein (Berlin: Stiftung EVZ, 2018), 266-80; and Dagi Knellessen and 
Ralf Bachmann, From Testimony to Story. Video Interviews about Nazi Crimes. Perspectives and 
Experiences in four Countries (Berlin: Stiftung EVZ, 2015). 

18 Bronec, “IWalk: Mapping Jewish Life.” 

19 See also Nigel King, Christine Horrocks and Joanna Brooks, Interviews in Qualitative Re- 
search, 2nd ed. (London: SAGE Publications, 2018). 

20 These not only related to anti-Semitism: the students had to reflect on and compare other 
forms of hate against stigmatized minorities in history. 
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become more active learners, while encountering eyewitnesses talking about 
their experiences in different historical periods.” The work was published 
in December 2020 on USC Shoah Foundation’s IWitness website - an educa- 
tional platform that responds to the demand to build multiliteracy skills and 
responsible digital citizenship among educators and students.” 


4.1 Nature of the clips used and the selection process 


My role as tutor was to identify testimony clips that would support students’ 
learning of the topic under study, whilst ensuring that these clips were not too 
graphic, emotional, or lengthy and that they were appropriate for the desired 
learning outcomes and the diverse audience. I was also responsible for provid- 
ing a broader historical context for the testimonies, considering the contexts in 
which the interviews were conducted. I did, however, take into account that 
there is a part of the academic community that is very critical of the methods of 
testimony conducting and maintenance.” 

Some scholars particularly decry the use of a vague system of questions 
given to interviewees. They often favorably cite the interviews undertaken by 
David P. Boder as examples of strictly and precisely led conversation.” The fact 
is that many witnesses felt a time pressure to pass on their testimonies to 
others, but they had already gone through a phase of biographical stabilization 
(getting married, starting a family, emigrating, building a new livelihood) by 
that point. It is clear that their memories cannot be viewed without critical re- 
consideration, but it is arguable whether or not parts of video clips that illus- 
trate Jewish daily life should be used in order to provide another historical 
dimension for those interested in history. Video recordings, or even audio re- 
cordings, have also given those who found themselves unable to write down 
their memories the opportunity to pass them on in other ways. 


21 Barton and McCully, “Teaching Controversial Issues,” 13-9. 

22 “IWitness USC Shoah Foundation, One Voice at a Time,” accessed July 20, 2020, https:// 
iwitness.usc.edu/sfi/. 

23 See, for example, Bothe, Die Geschichte der Shoah, 103. 

24 Frank Mehring, “The 1946 Holocaust Interviews: David Boder’s Intermedia Project in the 
Digital Age,” Amerikastudien 58, no. 1 (2013): 139-50. “In the summer of 1946, Chicago-based 
psychologist David P. Boder undertook a remarkable interview project. [. . J Boder went to 
shelter houses in and around Paris, Geneva, Munich, Wiesbaden, and Tradate [Italy] to con- 
duct 130 interviews. [His archive] has been excavated and remediated [. . .] by the Illinois In- 
stitute of Technology.” 
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Visual history archives (VHAs) such as those of USC Shoah Foundation and 
the Fortunoff Archive reveal important views, visual codes, and cultural patterns. 
However, users have to take into account a few critical aspects that are associated 
with VHAs, as some sources of oral history do not give a well-balanced depiction 
of historical events. Collecting memories and testimonies is a good example of 
how difficult it is to create coherent archives for academic and educational use. 
Administrators have to contend with multilingualism, divergent memories of in- 
terviewees depending on their country of origin, and the subjective narratives of 
those interviewees. The potential for errors and misunderstandings in translations 
and transcriptions can also be significant. Moreover, there are the technical chal- 
lenges of digital archival indexing. According to Alina Bothe, the Fortunoff Ar- 
chive at Yale is regarded as a professional benchmark for conducting interviews — 
and academic scholars use it as an alternative to the USC Shoah Foundation data- 
base.” However, based on my experience, the Fortunoff Archive has been strug- 
gling with its indexing of testimony metadata and contains several shortcomings 
in terms of its methodology for conducting interviews.” We can conclude that 
testimonies deposited in a VHA are grounded in the memories of survivors 
(changing though they may be), but that scholars can benefit from putting 
them into historical context. A platform itself does not provide broad histori- 
cal context, but the USC Shoah Foundation has recently organized several 
workshops on providing historical context and ethical editing, in which ques- 
tions of digital source criticism and responsible practices for editing authentic 
historical testimonies were discussed with teachers and researchers.” 

IWitness, an educational website for teachers and their students, could 
prove to be a turning point as it now provides a broader historical context for the 
database of testimonies, through the workshops it organizes for teachers. This 
interactive online platform gives students access to more than 1,600 testimonies 
for guided exploration, and more than 39,000 educators around the world have 
been trained to incorporate its testimonies into classroom lessons. The medium 
itself can provide opportunities for students to engage their media literacy skills, 
by challenging them to critically consider the sources: Who is this person? Why 
are they telling their story? To whom? Under what circumstances?”* My students 


25 Bothe, Die Geschichte der Shoah, 108. 

26 Jakub Bronec, “Malach Visual History Center Conference and Workshop on the New Proce- 
dure and the Use of the Fortunoff Video Database,”Marginalia Historica: Casopis pro déjiny 
vzdélanosti a kultury 7, no. 1 (2020): 164-7. 

27 Andreas Fickers, “Digital hermeneutics: the reflexive turn in digital public history?,” un- 
published document: 6. 

28 “USCF Teaching guidelines,” last modified May 11, 2015, www.facinghistory.org. 
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and I also selected clips that would support the use of the KWL (Know, Want [to 
know], Learned) teaching methodology” - ones that correspond to the same 
methodological conducting of interviews. These clips are of similar lengths, were 
recorded in the same year, and in the same country — and we only picked ex- 
cerpts associated with the same questions, such as, Did you ever experience anti- 
Semitism before the war? 

Future users of the [Walk app are likely to come from different backgrounds 
and not to have a thorough knowledge of citizen history — the students develop- 
ing the Luxembourg app content therefore adapted the wording relating to the 
individual stops on the virtual tour to make the content more comprehensible to 
the general public. To identify relevant locations for thematic perspective, stu- 
dents chose the clips first. It was important to make a connection between the 
chosen stops and the clip — and it was considered advisable, although not essen- 
tial, to have a clip at each stop. Each stop in the tour also had to be within easy 
walking distance of the previous stop. After long discussions, students eventually 
created six stops dealing with issues of pre-war anti-Semitism, forced emigration, 
restitutions, active resistance, and Jewish traditions in Luxembourg. 


4.2 Participants: Students and tutors 


Three experts from the Zachor Foundation for Social Remembrance in Budapest 
and the Malach Centre for Visual History in Prague came to Luxembourg to 
help me implement [Walk and MAXQDA, in order for me to run my video con- 
tent analysis course. To improve my own expertise in this field, I completed a 
number of in-depth training sessions on how to effectively use Walk with other 
historical resources, such as archival materials. Because of the detailed nature 
of qualitative research, small sample sizes are recommended, with a focus on 
“selecting information-rich cases for study in depth. Information-rich cases are 
those from which one can learn a great deal about issues of central importance 
to the purpose of the inquiry.”*° I also attended several workshops at the Mal- 
ach Centre and the Zachor Institute on maintaining ethical standards when 
conducting interviews. These mainly familiarized workshop participants with 
the pre-interview questionnaire completed by narrators (interviewees) which 
enables us to discover, among other things, a narrator’s biography, political 


29 Students identify what they already know about the time period the testimony references, 
what they want to know about that time period, and what they learned from the testimony. 

30 Michael Quinn Patton, Qualitative Research & Evaluation Methods: Integrating Theory and 
Practice, 4th ed. (Thousand Oaks, California: SAGE Publications, 2015), 230. 
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thinking, and religious classification, as well as relevant metadata related to 
key locations in their narrative. 

As for USC Shoah Foundation, its first goal is to “overcome prejudice, intol- 
erance, bigotry — and the suffering they cause - through the educational use of 
the Foundation’s visual history testimonies.”*! Secondly, it is dedicated to mak- 
ing “audio-visual interviews with survivors and witnesses of the Holocaust and 
other genocides a compelling voice for education and action.” Alina Bothe 
eventually admitted in her critical 2018 publication that the Foundation had 
shifted toward being more of an educational and scientific institution under its 
then director.” Despite criticism from scholars (albeit open and often meaning- 
ful), the Foundation remained a trustworthy institution for narrators them- 
selves. When IWitness was launched in its alpha version in 2010, the thousand 
narrators whose testimonies were to be included on the platform were asked 
whether they were comfortable with their testimonies being published online. 
Of 1,000 notices issued, less than 1 percent of the interviewees requested that 
their testimony be withdrawn from Internet-based distribution.“ 


5 MAXQDA as an analytical digital tool: Creating 
a new experimental space for IWalk projects 


While working on the [Walk tours in Luxembourg, students had to analyze the 
selected testimonies by applying the MAXQDA analytical tool. From a methodo- 
logical perspective, this enabled them to judge the relevance of interviews for 
the tours. Students also learned to link different text passages to each other, as 
well as to other documents, geographical locations, diaries, educational web- 
sites, and historical images. To establish a methodological basis we analyzed 
the interviews using hermeneutical case analysis” and comparative case analy- 
sis. The codes for qualitative textual analysis that were developed in these two 


31 “Shoah Foundation embarks on new mission,” PastForward 2 (2001): 2. 

32 “About us: Our mission Is to develop empathy, understanding, and respect through testi- 
mony,” USC Shoah Foundation, accessed July 20, 2020, https://sfi.usc.edu/about. 

33 Bothe, Die Geschichte der Shoah, 141. 

34 Claudio Fogu, Wulf Kansteiner, and Todd Presner, Probing the Ethics of Holocaust Culture 
(Cambridge, Massachusetts: Harvard University Press, 2016), 134-5. 

35 Udo Kuckartz and Stefan Rädiker, Analyzing Qualitative Data with MAXQDA: Text, Audio, 
Video (Cham: Springer, 2019), 72. 
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methodological phases were allocated to the text segments to determine differ- 
ent thematic categories in the transcribed interviews. 

To find patterns in the testimonies we used both inductive’ and deductive 
coding.” Using multiple coding methods to analyze the same dataset and com- 
paring the findings reduces certain biases.” Deductive coding can help stu- 
dents understand the structure of individual testimonies. It also improves their 
understanding of narrators of different ages talking about the same historical 
event from different perspectives. The students mainly used two of MAXQDA’s 
basic analysis functions: “hierarchical category system” and “thematic summa- 
ries.” They summarized text passages to which the same code had been as- 
signed on a case-by-case basis. 

Why did I decide to use MAXQDA for qualitative analysis working with a 
team? First, it facilitated teamwork management. When several students are 
working with one dataset, it is important to create a clear system of memos, 
codes and intercoder agreements?” that they can apply to that dataset.*° Since 
it is essential to code data in a similar manner, I opted for MAXQDA because it 
has one of the best memo retrieval systems, making it particularly useful for 
teamwork. 

MAXQDA allows users to use their own favorite codes and code sets as a 
compilation of codes. Optionally, weighting of coded segments and the addition 
of comments are also possible.“ For a comparative analysis, students used a uni- 
fied thematic coding tree they had created themselves. The use of a common 
code tree enabled them to find thematic intersections in their work. We also de- 
veloped an intercoder agreement demonstrating how different analysts coded 
the same data and we used this to identify differences in coding practices. 


36 “Inductive coding method is used when you know little about the research subject and 
conducting heuristic or exploratory research. In this case, you don’t have a codebook, you’re 
building on from scratch based on your data.” Erika Yi, “Themes Don’t Just Emerge - Coding 
The Qualitative Data,” accessed 18 July, 2020. https://medium.com/@projectux/themes-dont- 
just-emerge-coding-the-qualitative-data-95aff874fdce. 

37 “Deductive coding is the coding method wherein you have developed a codebook as a ref- 
erence to guide you through the coding process. The codebook will be developed before your 
data collection starts, usually in the process of researching the existing field.” Yi, “Themes 
Don’t Just Emerge,” accessed 18 July, 2020. 

38 Michael Quinn Patton, “Enhancing the Quality and Credibility of Qualitative Analysis,” 
Health Services Research 34, no. 5 part 2 (1999): 1189-208. 

39 To create intercoder agreements, several coders process the same document independently 
and code it according to mutually agreed code definitions. 

40 Kuckartz and Rädiker, Analyzing Qualitative Data, 254. 
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To give a specific example of an intercoder agreement, we assigned the sub- 
code “Open antisemitism” to the text passage: “Kids were really aggressive shouting 
at me ‘dirty Jew.’ Fortunately, I was not there alone, but together with my brother. 
We were always able to defend ourselves. We also made good friends who always 
stood by us.” In an effort to explore and describe the social roles and interactions 
of interviewees we involved all the students in discussing our intercoder agree- 
ments. We decided that the subcode “Open antisemitism,” for example, could only 
be applied to specific cases where there was a clear social interaction between in- 
dividuals and the interviewee spoke about a clear anti-Semitic offense. By contrast, 
we defined the subcode “Hidden antisemitism” as relating to anti-Semitic texts 
and speeches in the media, or ambiguous remarks on Jewish origin.“ The students 
focused primarily on the different social interactions described by witnesses. We 
defined social interaction as “a form of human behavior to which the actors in- 
volved attach subjective meaning and which is related to the behavior of others. 
The term ‘meaning’ is related to the subject and is not, according to Weber’s defini- 
tion, ‘any kind of objectively correct or metaphysically explored true meaning.” 

Our understanding of human behavior was facilitated by “methodological 
triangulation,” a term used in the social sciences to describe the combined use 
of more than two research methods to achieve more reliable results "7 One such 
method, thematic content analysis, consists of the three stages of methodologi- 
cal triangulation: pre-analysis, exploration, and interpretation.*” MAXQDA can 
include all three phases and assign them to codes, summaries of texts, files, 
notes and results in the form of tables and graphs. All of these functions offered 
my students a wide range of analytical approaches. In addition, the tool obliged 
them to observe several ethical regulations. Once the data was analyzed and 
coded, we used investigator triangulation (the use of multiple researchers in an 
empirical study — here, coding by more than one student) in order to further 
establish reliability in the coding of our data, and employed peer review to as- 
sess the analysis and coding in terms of inter-rater reliability (the extent to 
which two or more coders agreed).“° 


42 Based on the “MAXQDA 2018 Manual,” accessed July 15, 2020, https://www.maxqda.com/ 
help-max18/welcome. 

43 Quoting Max Weber, Wirtschaft und Gesellschaft: Grundriss der Verstehenden Soziologie: 
Grundriss der Verstehenden Soziologie, 5th ed. (Heidelberg: Mohr Siebeck, 2002), 4. 

44 See, for example, Patton, Qualitative Research. 

45 Wendy Gordon, “Behavioral Economics and Qualitative Research - A Marriage Made in 
Heaven?,” International Journal of Market Research 53, no. 2 (2011): 171-85. 

46 Sharan B. Merriam and Elizabeth J. Tisdell, Qualitative Research: A Guide to Design and 
Implementation, 4th ed. (San Francisco, CA: John Wiley, 2015): 77. 
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Afterward, the students analyzed measurements of the frequency of varia- 
bles and their mutual correlations. In terms of building the theoretical concepts 
of our [Walk tours, we found MAXQDA’s memoing tools to be well-suited to our 
requirements. The tool helped to harmonize coding approaches among the stu- 
dent analysts and I could easily check that they were not digressing from the 
approved coding tree. 

However, as the pattern grows, it becomes increasingly difficult to identify 
complex patterns. We only analyzed around 20 interviews in our qualitative 
study but, even with this population size,“ it became increasingly difficult to 
recognize thematic and social intersections. We had to refrain from using labels 
with quasi-statistical terms (typical, mainly, pattern, etc.). 

The visual tool that we used the most in MAXQDA was the Document Por- 
trait, which displays any text as a “painting” of either all or selected codes as- 
signed throughout the text. The students could choose colors for their code - 
e.g. a special color for Holocaust (black) or Jewish traditions (green) - or select 
some emoticons to stress positive and negative aspects. Factors that played an 
important role are therefore immediately visible and therefore easier to locate 
in the interviews or any other text. The tool “takes the size of the text segments 
into account and ‘weights’ the color according to the segment’s size. The color 
attributes of the codes associated with the document are displayed in a matrix 
with little squares arranged in rows, each one with 40 squares.”“* In my opin- 
ion, this is a valuable feature that gives viewers an uncluttered visual impres- 
sion of the text’s content. 

Students also appreciated the Code Relations Browser (CRB) visualization 
tool, which shows how codes overlap in a given document and allows quick iden- 
tification of possible connections between codes, thus enabling students to iden- 
tify all clusters with particular codes. The tool is also a good way to test the 
quality of a code system. If there are no intersections in your corpus, it may be 
indicative of problems with the way you set up your coding system. The CRB 
maps the chosen documents on the x-axis, while the y-axis contains the whole 
code system. If larger or smaller squares are located between the axes, then the 
codes overlap. You can decide whether you want to analyze a particular segment 
or the whole corpus. With this visualization, the students were able to reflect on 
and analyze incidents relating to the Holocaust“ and gain more detailed insights. 


47 This is small compared to the sample size in mainstream social research. 

48 “MAXQDA 2018 Manual,” accessed July 15, 2020, https://www.maxqda.com/help-max18/vi 
sual-tools/document-portrait-visualizing-a-document. 

49 Including, for example, postwar restitutions. 
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6 Conclusion: IWalk and a digital hermeneutical 
approach 


Digital educational tools seem to be omnipresent these days and the digital her- 
meneutics of history is no exception. “Digital hermeneutics can be defined as a 
set of skills and competences that allow historians to critically reflect on the vari- 
ous interventions of digital research infrastructures, tools, databases and dissem- 
ination platforms in the process of thinking, doing and narrating history.””° 

To critically analyze the role of IWalk in terms of Jewish public history we 
also have to consider the ethical dimension of the whole project. 

When using digital tools to investigate the lives of Holocaust survivors, 
scholars must be aware of the ethical issue of treating “Holocaust victims as 
quantifiable entries in a database and [visualizing] their lives as data points 
using colored pixels on a bitmap,” such as with the Digital Monument of the Jew- 
ish Community in the Netherlands website.” Sociologist Zygmunt Bauman criti- 
cized this approach as an act of dehumanization in his seminal work Modernity 
and the Holocaust. According to Bauman, scholars should refrain from codifying 
and instrumentalizing human morality and experience, and from technifying the 
subjective individuality of human experience.” The content of IWalk, however, 
does not turn narrators into unknown intangible persons, since the different war 
incidents, crimes, events” and deeds™ that they relate in the video clips enable 
the narrators to live on the screens of our devices. 

According to Andreas Fickers, we need to define digital hermeneutics as a 
hermeneutics of in-betweenness to give space to problematizing tensions be- 
tween the analog and digital interpretation of history.” He argues that we can- 
not totally abandon strongly embedded analog practices and traditions. In fact, 
it would be counterproductive to lose the current model of historical hybridity 
based on the current duality of parallelism of analog and digital practices. 
Using a great variety of data, the [Walk represents a compromise — the tool that 


50 Fickers, “Digital Hermeneutics,” 2. 

51 Fogu, Probing the Ethics, 175; and “Joods Monument,” accessed July 16, 2020, https://www. 
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does not reprobate either analog or digital practice. It represents a platform 
where you can comprehensively customize your data to make them comprehen- 
sible to both students and the general public. 

Based on a kitchen metaphor elaborated by Anita Lucchesi,” the raw histori- 
cal datasets used in creating [Walk tours in Luxembourg had passed through the 
digital kitchen and then in turn been “cooked” by the students into an interactive 
educational tool grounded in the practice of doing Jewish public history in the 
digital age. Walter Benjamin argues in his “The Storyteller” essay that we have 
“lost the ability to share experiences” or, as Todd Presner summarized it, “The 
experiences of the war event and mass death could no longer be observed, de- 
scribed, and communicated using the structures and meaning-making strategies 
reserved for historical realism, which was part and parcel of the tradition of sto- 
rytelling with clear agents, a coherent plot, and narrative strategies characterized 
by the unities of time, place, and action]. . .].”°” This argument should not be in- 
terpreted as an act of resignation to historical facticity, but as a necessity for find- 
ing a new epistemological balance. Presner admits that, although USC Shoah 
Foundation’s VHA “assures factuality and facilitates access and preservation, it 
has the side effect of flattening differences between the testimonies and render- 
ing listening one-directional.””® Based on these facts, I assert that testimonies 
should be used as a relevant historical source in education, but with the proviso 
that they have to remain in the hands of professionals (teachers or academics). 


56 See Anita Lucchesi, “For a New Hermeneutics of Practice in Digital Public History. Thinker- 
ing with memorecord.uni.lu,” (unpublished PhD diss., University of Luxembourg, 2020). Luc- 
chesi describes the mediated memories as tira-gostos (appetizers), the historical context as the 
“menu,” and the digital platforms such as Memorecord (similar to IWalk) as the “digital 
kitchen” she used for producing the digital public history product. 

57 Walter Benjamin, “The Storyteller: Reflections of the Works of Nikolai Leskov,” in Illumina- 
tions, ed. Hannah Arendt, trans. Harry Zohn (New York: Schocken Books, 1968), 83; and Todd 
Presner, “The Ethics of the Algorithm,” in Probing the Ethics of Holocaust Culture, ed. Claudio 
Fogu, Wulf Kansteiner, and Todd Presner (Cambridge, Massachusetts: Harvard University 
Press, 2016), 181. 

58 Presner, “The Ethics of the Algorithm,” 182. 
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Christopher Morse 
Meaning-making in the digital museum 
Reflections on a hermeneutics of the user 


1 Introduction 


In the inaugural post for the Digital Tool Criticism blog of the University of Lux- 
embourg’s Digital History & Hermeneutics (DHH) unit, Koenig et al. assert that 
the field of human-computer interaction (HCI) has much to offer humanities 
computing.’ Indeed, while the adoption of digital tools is no longer novel to the 
humanities, the increasing sophistication and interdisciplinarity of those tools 
present challenges that have long been the subject of HCI research. Recent ad- 
vances in HCI have, for example, moved us away from tool-centric design and 
toward a more nuanced and deeply reflective human-centric model. This new 
approach necessitates a rethinking of our relationship to user interface design 
for the arts and humanities, which is the subject of this chapter. 

It is through this evolving human-centric model for technology design, also 
known as user experience (UX), that I have conducted my doctoral research on 
user interfaces for digital museum collections. Central to this work is under- 
standing how interfaces mediate the experience of browsing and discovery 
within digital cultural heritage environments, such as a fine arts collection on a 
museum website. As Drucker argues, interface is what we read, and how we 
read, combined through engagement - a provocation to cognitive experience 
and to meaning-making itself.? This conception of interface underlies my own 
critical analysis, which attempts to reconcile two approaches to interface de- 
sign and development: the UX method (Fig. 1) and the digital hermeneutical 
tradition. As conceptually adjacent interpretive frameworks for the design and 
use of technologies, each system carefully considers the subjective nature of 


1 Vincent Koenig, Juliane Tatarinov, and Christopher Morse, “Tool Criticism Meets Human- 
Computer Interaction (HCI),” Digital History & Hermeneutics blog, November 25, 2019, 
accessed June 15, 2020, https://dhh.uni.lu/2019/11/25/tool-criticism-meets-human-computer- 
interaction-hci/. 

2 Johanna Drucker, “Humanities Approaches to Graphical Display,” Digital Humanities Quar- 
terly 5, no. 1 (2011). 
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knowledge production and consumption involved in digital interaction. My 
work is therefore a synthesis of these interrelated ideas, a hybrid approach that 
engages the complexity of designing meaningful interactions with cultural data 
while simultaneously reflecting on the merits of each method. 

An important outcome as a result of this research trajectory lies within the 
relationship between UX and digital hermeneutics. In this chapter I argue that 
beyond merely reinforcing digital hermeneutics in the context of humanities 
computing, the UX method also offers something more: a “hermeneutics of the 
user.” This vital perspective encourages design researchers to empathize with 
the target audience of a particular technology in order to inform the design of 
meaningful interactions therein. Digital hermeneutics conceives of the histo- 
rian-as-user who is constantly mindful of methodological concerns throughout 
the digital research process,” but UX extends this critical apparatus beyond the 
pragmatic, task-oriented mode of knowledge production and into its hedonic 
and eudaimonic elements. That is to say, UX investigates the emotional quali- 
ties (hedonia) of digital interaction, and their capacity to trigger moments of 
personal reflection, empowerment, or meaning-making (eudaimonia).* Through 
a discussion of the user research process across a series of different UX studies 
I conducted, I will demonstrate the ways in which a hermeneutics of the user 
can support the design of future technologies in the arts and humanities, pay- 
ing special attention to the role of meaning-making as an important component 
of interaction and knowledge production. 


2 Experience design for digital cultural 
collections 
The overarching research question informing my doctoral work asks how we 


might design meaningful interactivity for digital museum collections. Just as 
museum professionals design physical spaces to inspire enjoyable, memorable, 


3 Andreas Fickers, “Update fiir die Hermeneutik. Geschichtswissenschaft auf dem Weg zur Digi- 
talen Forensik?,” Zeithistorische Forschungen/Studies in Contemporary History 17, no. 1 (2020). 

A Marc Hassenzahl and Noam Tractinsky, “User Experience — A Research Agenda,” Behaviour 
& Information Technology 25, no. 2 (2006): 91-7; and Veronika Huta and Alan S. Waterman, 
“Eudaimonia and Its Distinction from Hedonia: Developing a Classification and Terminology 
for Understanding Conceptual and Operational Definitions,” Journal of Happiness Studies 15, 
no. 6 (2014): 1425-56. 


Meaning-making in the digital museum —— 279 


and, ultimately, meaningful experiences for in-person visitors,’ so too should 
this same consideration extend into digital spaces where the presence of online 
museum collections has become the norm. 

It is an opportune time to be considering such a question. Decades of digiti- 
zation initiatives at museums around the world have created a massive influx 
and staggering complexity of cultural data on the Web.° Access to this data and 
meaningful navigation through it remain a challenge because the user interfa- 
ces that mediate cultural collections too often rely on outdated information- 
seeking behaviors.’ Consider, for example, our reliance on targeted search to 
access the wide variety of information we consume on a daily basis. In the con- 
text of cultural heritage, Whitelaw describes the search bar in terms of a mu- 
seum attendant who requires visitors to request specific artworks rather than 
allowing them to casually browse the gallery Hoor H In this task-oriented mode 
of information-seeking there is little room for spontaneity or serendipity. In lieu 
of exploration and discovery, digital collections become a locus for subject mat- 
ter experts, often to the detriment of casual users. 

However, meaningful interaction as a quality of user experience extends 
beyond mere browsing and discovery. UX is often misconstrued as usability, 
that is to say, narrowly concerned with a technology’s capacity to assist in the 
accomplishment of a specific task, or to fulfill a particular information need.” 
This reductionist view fails to account for the cognitive, emotional, and experi- 
ential aspects that also inform perceptions of technology use, and which are 
central to the UX design process. These subjective concerns are themselves the 
very objects of a hermeneutics of the user and a primary contribution of my the- 
sis, which explores meaningful design as it relates to emerging information- 
seeking behaviors in cultural heritage. My users are adult museum visitors and, 
more specifically, the digital visitor. I investigate how their interactions within 


5 Linda Norris and Rainey Tisdale, “Developing a Toolkit for Emotion in Museums,” Exhibition, 
(2017), 100-8. 
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no. 6 (2019), 2311-30. 
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8 Mitchell Whitelaw, “Generous Interfaces for Digital Cultural Collections,” Digital Humanities 
Quarterly 9, no. 1 (2015). 
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museums can inform their experience of an online visit. Additionally, I consider 
the role of user-as-creator and the implications of involving the public in the 
co-creation of museum technologies they will one day use. Together, these indi- 
vidual approaches form a holistic view of museum technologies from the point 
of view of the user, offering new perspectives on experience design for the prac- 
tical, but also the aesthetic, emotional, or even sublime. 


3 Moving beyond the experience economy 


Why do we care about designing experiences around digital cultural collec- 
tions, and what forms might that take? Pine and Gilmore’s essay in the Harvard 
Business Review on the experience economy formalized a growing economic 
trend that foresaw a limitless commodification potential of the experience DI In 
lieu of goods and services, experience economies specialize in offering sensa- 
tions, new memories, social connections, and other forms of individualized, 
meaningful events and encounters. In opposition to this development, some 
cultural professionals have decried the experiential turn in museums, which 
they argue has compelled cultural heritage institutions to rebrand themselves 
as theme parks or cultural complexes that cheapen or trivialize their original 
missions." 

In spite of these very real pitfalls, experience design for cultural heritage 
has implications that reach far beyond its economic impact. As a comprehen- 
sive self-reflexive design thinking approach, it has emerged within HCI’s “third 
wave,” where design embraces meaning-making and critiques the notion of ef- 
ficiency for efficiency’s sake.” It is the difference between a museum app with 
a robust search interface and one that employs mindfulness techniques to create 
calm moments of reflection with a single artwork. By empathizing with users, we 
come to understand how transformational museum experiences occur, and how 
they can contribute to meaning-making in digital spaces. 
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For Simon, meaning-making in museums comes from creating relevance, 
in other words, orienting the museum’s priorities to reflect the lived realities of 
the communities they serve.’ A similar theme appears in the work of Vermee- 
ren et al., who emphasize the notion of a community-centered museum rather 
than a collection-centered institution.’ Aware of this trend, museums have 
begun to solicit the involvement of the public in the curation of exhibits’ and 
the development of new interactive Technologies 15 Many of these developments 
have grown in tandem with developments in museology and public history, 
where participants are encouraged to take ownership of their own interpretive 
authority and connect to the past through their own lives and perspectives." 
By drawing on the expertise of their communities, museums can make them- 
selves essential fixtures within them. 

In recent years, similar attempts to flatten the hierarchy of knowledge pro- 
duction and consumption have extended into the design of museum technologies 
and become an important area of study in HCI research. Generally speaking, how- 
ever, researchers tend to focus disproportionately on technologies inside of muse- 
ums. As Petrelli et al. argue, many museum professionals still view the museum 
and its digital initiatives as separate worlds altogether, and more re- cently this 
divide has transposed itself into technology on-site versus technology online." In 
the introduction to their recent monograph, Human-Computer Interaction in Muse- 
ums, Hornecker and Ciolfi acknowledge the growing interest in digital museums 
and similar platforms, but nevertheless consider the technologies as out of scope 


13 Nina Simon, The Art of Relevance (Santa Cruz, CA: Museum 2.0, 2016). 

14 Arnold P. O. S. Vermeeren et al., “Future Museum Experience Design: Crowds, Ecosystems 
and Novel Technologies,” in Museum Experience Design, ed. Arnold Vermeeren, Licia Calvi, 
and Amalia Sabiescu (Cham: Springer International Publishing, 2018), 1-16. 

15 Luigina Ciolfi, Liam J. Bannon, and Mikael Fernström, “Including Visitor Contributions in 
Cultural Heritage Installations: Designing for Participation,” Museum Management and Cura- 
torship 23, no. 4 (2008): 353-65; and Amy S. Weisser and Alison Koch, “Talking Through Our 
Pain: Visitor Responses at the 9/11 Memorial Museum,” Exhibition vol. 36, no. 1 (2017): 78. 

16 Joel Lanir et al., “The Influence of a Location-Aware Mobile Guide on Museum Visitors’ Be- 
havior,” Interacting with Computers 25, no. 6 (2013): 443-60; and Paul F. Marty, “My Lost Mu- 
seum: User Expectations and Motivations for Creating Personal Digital Collections on Museum 
Websites,” Library & Information Science Research 33, no. 3 (2011): 211-9. 

17 Benjamin Filene, “History Museums and Identity: Finding ‘Them,’ ‘Me,’ and ‘Us’ in the Gal- 
lery,” in The Oxford Handbook of Public History, ed. James B. Gardner and Paula Hamilton (Ox- 
ford: Oxford University Press, 2017). 

18 Daniela Petrelli et al., “Tangible Data Souvenirs as a Bridge Between a Physical Museum Visit 
and Online Digital Experience,” Personal and Ubiquitous Computing 21, no. 2 (2017): 281-295. 


282 —— Christopher Morse 


for their study.'? For museum collections on the Web this overlooks many impor- 
tant and unresolved challenges that warrant closer inspection. 

Digital collections struggle to engage users, resulting in platforms that go 
unused or remain largely underappreciated.”° The increasing power of Web 
technologies has not embraced an equally sophisticated translation of modern 
museological theory and practice into the digital.” Many collections default to 
static libraries of objects, often displayed out of context and accompanied by 
an authoritative wall of text - a passive recapitulation of the museum’s colonial 
history as the ultimate purveyor of culture. This didactic approach to digital 
museum learning does not take advantage of the vast potentials afforded by 
digital spaces, and is instead reminiscent of the nineteenth century museum.” 
Digital collections also face the challenge of authenticity, which is to say the 
direct and tangible confrontation with artworks.” Dematerialized museum ob- 
jects do not have the same perceived value as their physical counterparts, but 
nevertheless many museums attempt to recreate their physicality on the Web, 
often through the creation of virtual gallery walk-throughs that have only lim- 
ited interactivity. 

Meaningful interaction design for digital cultural collections confronts a 
number of important challenges, as described above. Each of these challenges 
directly implicates the user. In cases where museum technologies fail to en- 
gage, this is arguably due to a kind of user myopia. Interactive systems should 
not only acknowledge the collections on display, but also the visitors in the gal- 
lery, even if the gallery is in cyberspace. 

How then should we understand the notion of an experience? In their semi- 
nal research agenda on UX design, Hassenzahl and Tractinsky extend the 
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notion of user experience beyond the instrumental, that is to say, beyond prag- 
matic or task-oriented behaviors, and into the more complex emotional and dy- 
namic aspects of product use.” They describe UX as “a consequence of a user’s 
internal state, the characteristics of the designed system,” and “the context 
within which interaction occurs.” In this framework, therefore, we can under- 
stand the notion of experience as a complex phenomenon resulting from the 
coalescing of personal disposition, situational circumstances, and a particular 
product, service, or technology. Museum experience design, therefore, must 
take these factors into account. 


4 Meaning-making in the museum experience 


Museum experience design has emerged within the HCI community as a result 
of advances in UX research and its application to the cultural heritage domain. 
A primary objective is to design meaningful technologies. But how can we de- 
fine meaning, and what makes technology meaningful? 

Meaning-making matters, argues Simon, and relevance is the key to un- 
locking it within museum spaces and communities.” Cultural professionals 
have increasingly adopted this perspective, and with good reason. Seminal 
work by Falk and Dierking on visitor experiences found that memories of mu- 
seum visits are persistent, salient, and highly personal.” In their later work, 
they emphasized the cognitive and social aspects of the museum visit that work 
in tandem with the design of museum spaces and the curation of exhibits to 
create meaning and inspire learning.” For both Simon, and Falk and Dierking, 
the museum resonates far beyond its front door. More than merely a metaphor, 
museum outreach is a tangible reality — particularly in the digital, where online 
galleries and virtual exhibits have become the norm for many museums around 
the world. 

Stepping back for a moment from how we might design for meaningful expe- 
riences, we must first consider more critically the question, What is meaning? 
Mekler and Hornbæk illustrate the inconsistent use of the term “meaningful” in 
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HCI literature, noting that it may refer to the user experience of a system, a par- 
ticular artifact or occurrence, or even a user’s interpretation of their own interac- 
tions.”® As the authors describe, in many cases publications make explicit use of 
the term “meaningful” in their titles, only to completely avoid defining it within 
the text. The term has been used quite freely, but recent work has sought to more 
clearly define this concept. 

In the case of Simon, meaningfulness results, at least in part, from culti- 
vating relevance.” For Falk and Dierking, meaningfulness arrives through the 
complex interplay of personal, physical, and sociocultural contexts that in- 
form our thoughts, behaviors, and underlying motivations.*° Falk’s typology 
of museum visitors, a collection of museum-specific identities that represent 
typical visitor behaviors, carefully considers the role of identity and personal 
motivation in learning and meaning-making.*' Take for example the identity 
of the “recharger,” who experiences the museum as a calm respite away from 
the world, where they connect with objects, artworks, and with themselves 
through contemplation or spirituality. Falk’s attempt to identify the underly- 
ing motivations that catalyze museum visitors to construct meaning offers a 
window into meaning-making that has actionable implications for designers 
of cultural technologies who are considering how to connect with different 
kinds of audiences. 

Another aspect of meaning appears in the UX research agenda of Hassenzahl 
and Tractinsky, where the authors contrast pragmatic usability with hedonic 
pleasure and stimulation, that is to say, the emotional and often subjective reac- 
tions that arise during interaction with a product, service, or technology.” 

Interacting with a useful product may bring satisfaction, but interacting 
with a beautiful and useful product offers something even more. Successfully 
retrieving a sought-after museum object while navigating through a digital col- 
lection might make you feel capable or smart, but watching the object come to 
life through an engaging digital storytelling experience may trigger interest, cu- 
riosity, or excitement. 
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An additional layer of experience comes in the form of eudaimonia, which 
Huta and Waterman describe as growth, meaning, authenticity, and excellence.” 
Often contrasted with hedonic experiences, which generally represent short-term 
pleasures and comforts, eudaimonia comprises those experiences that trigger 
long-term change, such as personal development or the feeling of well-being. In 
their analysis of eudaimonia and hedonia, Huta and Waterman emphasize the 
challenges of discerning between the two, as they are often interrelated — but 
even amidst this tension researchers have yet another lens through which to con- 
sider the notion of meaningfulness in design.* This lens draws from an empiri- 
cally based understanding of subjective experience, sense of self, and personal 
motivations. 

Returning to Mekler and Hornbeek, we see in their work many of the afore- 
mentioned ideas aggregated into a framework of meaning that attempts to an- 
swer the important question, What makes interaction good?” Their framework 
presents a series of five criteria that underlie the experience of meaning during 
interaction: connectedness, purpose, coherence, resonance, and significance. 
Meaning emerges as a result of a personal connection, or in relation to our partic- 
ular circumstances in the world (connectedness). It aligns with or even chal- 
lenges our aims, goals, and personal agency in life (purpose). Meaning happens 
when something makes sense to us — when we are able to understand how an 
experience fits into our perception or worldview (coherence). It has an intuitive 
quality insofar as it carries an inherent, unspoken feeling of rightness or wrong- 
ness that we feel “clicks” with us or does not (resonance). Finally, it is nontrivial: 
meaning has lasting impact that matters (significance). 

Much like the concepts of experience and experience design, meaning- 
making is a complex process inextricable from the thoughts, feelings, and per- 
sonal identities of users. In the development of my own project, “meaningful 
design” has come to embody this synthesis of ideas as they make themselves 
relevant throughout the design process. However, from the perspective of mu- 
seum experience design, it has also become clear that meaning has an impor- 
tant communal aspect that museums must consider, even in digital spaces. As 
museums increasingly embrace their emerging role as centers for public activi- 
ties of all kinds, so too should they reconsider their digital spaces in order to 
accommodate for these activities. 


33 Huta and Waterman, “Eudaimonia.” 
34 Huta and Waterman, “Eudaimonia.” 
35 Mekler and Hornbek, “A Framework.” 
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5 Applying a user-centered methodology to the 
design of digital collections 


UX design is still a relative newcomer to cultural heritage and humanities com- 
puting. In a recent systematic review of visualization tools in cultural heritage, 
Windhager et al. reported that more than half of publications surveyed did not 
include any kind of user study.” Therefore, an important objective of my proj- 
ect is to serve as a case study in the design of digital tools for cultural heritage 
using a UX process. Understanding the triggers behind memorable museum ex- 
periences and the ways in which museum visitors interact with arts and culture 
online are critical steps to inform the design of engaging digital browsing expe- 
riences. Grounding the design and implementation of these new features in the 
UX process ensures a close relationship between user and technology, curator 
and museumgoer, and an overall improvement in the usability of the interac- 
tive system. 

Central to the UX method (Fig. 1) is an iterative cycle consisting of five steps: 
planning, exploration, ideation, generation, and evaluation.’ This method is ap- 
plicable to virtually any research project or idea, allowing museum technology 
designers to empathize with the users for whom they are designing, while also 
providing methods to empirically measure the opportunities and pain points im- 
plicated in the design of the technology. 

Here, I contrast the UX method with that of digital hermeneutics as de- 
scribed by Fickers, which identifies algorithm criticism, digital source criti- 
cism, tool criticism, and interface criticism throughout the research process.” 
Additionally, other forms of multimodal literacy, such as data criticism and 
simulation criticism, are described within this framework. As an update to the 
classical tradition of Schleiermacher, Dilthey, Heidegger, and Habermas, digi- 
tal hermeneutics seeks to expose and critique the oftentimes invisible layers of 
meaning-making that happen within automated environments. It posits that 
tools, interfaces, and encodings have implications for the collection, interpreta- 
tion, and presentation of data, and in doing so establishes new, critically ori- 
ented information behaviors within the digital research process.” 


36 Windhager et al., “Visualization.” 

37 Carine Lallemand and Guillaume Gronier, Methodes de Design UX, 2nd ed. (Paris: Eyrolles, 
2018). 

38 Andreas Fickers, “Update fiir die Hermeneutik. Geschichtswissenschaft auf dem Weg zur 
digitalen Forensik?”, Zeithistorische Forschungen/Studies in Contemporary History, Online- 
Ausgabe, 17, no. 1 (2020). 

39 Fickers, “Update.” 
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UX strategy Co-design workshops, creativity Lab or field 


Project management techniques, personas and user journeys user testing 


Planning Ideation Evaluation 


Exploration Generation 


Interviews, focus groups, Storyboarding, low- and 


observation, design p robes h igh fidelity iterative prototyping 


Fig. 1: UX design method scheme. 2018. © Carine Lallemand, University of Luxembourg, first 
published in French in: Lallemand, Carine, and Guillaume Gronier. Methodes de design UX. 
2nd ed. Paris: Eyrolles, 2018. 


As iterative frameworks for the design and use of technologies, the UX method 
and the digital hermeneutical tradition both represent a new critical apparatus for 
digital research. I suggest, however, that while digital hermeneutics contributes to 
a deeper understanding of the instrumental and goal-oriented aspects of the re- 
search process (e.g. identifying bias in data visualization, critiquing sources, etc.), 
a hermeneutics of the user that arises from the UX method considers as well the 
aesthetics of use, its subjective or emotional experience, the personal dispositions 
of its users (e.g. information-seeking habits, technology preferences, etc.), and its 
temporality (e.g. before, after, and long after use). To put this more into perspec- 
tive, following are two of the user studies on museum visitors that I conducted 
during my research process, and which embody these aspects. 


6 Study 1: Experience narratives and the 
meaningful museum experience 


Although the aura of a museum object is not easily translatable into the digi- 
tal,“° there are other kinds of experiences that designers in the cultural sector 
might consider instead. In the first of the two studies discussed here, I sought 
to uncover how meaning-making occurs for museum visitors during in-person 
visits. By understanding what happens during physical visits, we can derive 
new approaches to designing for the digital. 


40 Evrard and Krebs, “Authenticity.” 
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Building on the work of Falk and Dierking, and Henry, whose interviews 
with museum visitors shed light on the complex nature of museum memories, 
my study endeavored to advance this research in the direction of “experience 
triggers.”“' Experience triggers represent the various phenomena that coalesce 
to form a memorable experience. In this case, triggers may be thoughts, feel- 
ings, encounters, objects, or any tangible or intangible aspect of the museum 
visit. In order to identify these triggers, I invited 32 participants to the User Lab 
at the University of Luxembourg to discuss their most meaningful museum ex- 
periences. During semistructured interviews, I asked participants to discuss 
aloud five to ten memorable museum experiences and then to report on each of 
them using an “experience narrative template” (Figure 2). 

Research has demonstrated that experience narratives contribute to a more 
holistic understanding of user interaction with digital technologies, and that 
emotional qualities — both positive and negative — are deeply intertwined with 
experience.“ We designed an experience narrative report that contained four 
sections: object description, keywords, rating, and emotions. 

The object description field asked participants for information about the ob- 
ject, artwork, or museum being discussed. This information included the name, 
location, and date of visit. The keywords section asked participants for three key- 
words that came to mind when considering their overall experience. These key- 
words could represent salient aspects of the memories themselves, or even how 
participants were feeling at the time of the experience. The goal was to give par- 
ticipants autonomy in their keyword choice. In the third section, participants 
rated the overall museum experience from 0 (very bad) to 10 (very good). Finally, 
in the emotions section, together with fellow researchers, I employed the Geneva 
Emotion Wheel (GEW), an empirically tested psychometric tool that allows par- 
ticipants to rate their emotional response and corresponding intensity to objects, 
events, and situations across an axis of valence and control.” 


41 Falk and Dierking, “Recalling”; and Carole Henry, “How Visitors Relate to Museum Expe- 
riences: An Analysis of Positive and Negative Reactions,” Journal of Aesthetic Education 34, 
no. 2 (2000): 99. 

42 Marc Hassenzahl, Sarah Diefenbach, and Anja Göritz, “Needs, Affect, and Interactive Prod- 
ucts - Facets of User Experience,” Interacting with Computers 22, no. 5 (2010): 353-62; Timo 
Partala and Aleksi Kallinen, “Understanding the Most Satisfying and Unsatisfying User Experi- 
ences,” Interacting with Computers, 24, no. 1 (2012); and Alexandre N. Tuch, Rune Trusell, and 
Kasper Hornbæk, “Analyzing Users’ Narratives to Understand Experience with Interactive 
Products,” (SIGCHI Conference on Human Factors in Computing Systems, Paris, 2013), 207-9. 
43 Klaus R. Scherer, “What Are Emotions? And How Can They Be Measured?,” Social Science 
Information 44, no. 4 (2005): 695-729; and Klaus R. Scherer et al., “The GRID Meets the Wheel: 
Assessing Emotional Feeling via Self-Report,” in Components of Emotional Meaning: A 
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ae EXPERIENCE NARRATIVE REPORT 


SE) Date 
1. Please describe the artwork. 4. Please describe how you felt during the experience. 
Name of Art/Exhibit; _—_— — ren 
Location: 5 
Date: 


2. Please describe your experience in 
three words. 


1 
2 


3 


3. Please rate the overall experience. 


Fig. 2: Experience narrative report template. 2019. © design by Christopher Morse. The wheel 
of emotions is part of the Geneva Emotion Wheel, published in the works of Klaus R. Scherer 
et al. (see references) and Swiss Center for affective Sciences https://www.unige.ch/cisa/ 
gew/, accessed February 7, 2022. 


In total, we collected almost 200 individual narrative reports on partici- 
pants’ meaningful museum experiences at museums around the world, which 
we documented both in a database and as verbatim transcriptions from each 
interview. We then performed an iterative thematic analysis on the transcrip- 
tions, paying close attention to the varying aspects of each experience that 
made it memorable for the participant. This “close reading” of the user through 
the collected experience narratives helped us to identify 23 unique triggers for 
memorable museum experiences, which we later incorporated into a framework 
of museum experience. 

Using this framework, I designed the Museum Experience Cards shown in 
Fig. 3 as an ideation tool and for use in design workshops with museum visi- 
tors. The ideation phase, which corresponds to the third step in the UX pro- 
cess of Fig. 1, built on the wisdom gained during my exploratory research and 
user studies, and included targeted brainstorming sessions, user modeling 


Sourcebook, edited by J. R. J. Fontaine, K. R. Scherer, and C. Soriano. (Oxford: Oxford Univer- 
sity Press, 2013): 281-98. 
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(e.g. personas), creativity workshops, and other activities that can inform the 
future design of technology prototypes. Ideation cards, such as the Museum 
Experience Cards, are commonly used during this phase, and studies have es- 
tablished their effectiveness in supporting creativity during design and co- 
creation activities.”* 


INSPIRATION INTROSPECTION 


Feeling inspired or empowered, Positive or negative 

or feeling uninspired and introspective, self-reflective, 
disempowered as a result of an philosophical, or contemplative 
experience. experiences. 


Fig. 3: Sample Museum Experience Card. 2018. © Christopher Morse. 


The Museum Experience Cards comprise one card for each experience trigger. 
The image and text on each card come directly from the experience narrative re- 
ports; as such they closely mirror the needs, expectations, and motivations of 
museum visitors — that is to say, future users. The “expectations” trigger, illus- 
trated on the left of Fig. 3, represents experiences where visitors either excitedly 
anticipate a particular exhibit or, conversely, feel disappointment when seeing a 
blockbuster exhibit and thinking to themselves, Oh, is that all? Another impor- 
tant trigger is “VIP” status, such as when a visitor receives some kind of special 
treatment or privilege (e.g. a private museum tour or unrestricted access to an 


44 Joanna Kwiatkowska, Agnieszka Szöstek, and David Lamas, “(Un)structured Sources of In- 
spiration: Comparing the Effects of Game-Like Cards and Design Cards on Creativity in Co- 
Design Process,” (13th Participatory Design Conference, Windhoek, Namibia, 2014), 31-9; and 
Kim Halskov and Peter Dalsgard, “Inspiration Card Workshops,” (6th ACM Conference on De- 
signing Interactive Systems, University Park, PA, 2006), 2. 
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object). It is rare that an experience is memorable due to only a single trigger. 
Instead, meaningful memories arise as a complex interplay of different triggers 
simultaneously. 

During the ideation phase I hosted a series of “design jam” events at Luxem- 
bourg’s National Museum of History and Art (MNHA) with the goal of involving 
members of the public in the design of museum interfaces, and also to test the 
Museum Experience Cards as a tool for inspiration during the session. Design jam 
participants broke up into teams, identified a museum digital collections-related 
design challenge to solve, and created their own low-fidelity prototypes as a part 
of those solutions. One group selected the “responsibility” trigger as their design 
case study. Responsibility here relates to the moral and ethical implications of the 
museum experience, such as notions of spoliation, curation of difficult themes, 
and even legal copyright. In particular, the group focused on pushing the bound- 
aries of fair use by designing a platform that gave users the ability to remix art- 
work and re-curate it on their own terms. In this way, the ideation cards served as 
a springboard to a design thinking solutions approach. 

The framework of experience triggers provides technology designers in the 
cultural sector with a toolkit to better understand the various experiential cate- 
gories relevant to the museum visit, and in doing so allows them to empathize 
with members of the public as future users. Rather than merely building tools 
that showcase the museum’s physical holdings, designers can target specific 
kinds of interaction, such as creating moments of “discovery” or “fondness.” 
Moreover, the resulting museum experience ideation cards can help nonexpert 
users (e.g. members of the public) participate in design thinking processes with 
museums, giving them a bridge to share their own experiences and to co-create 
the technologies they will one day use. 


7 Study 2: Rich-prospect browsing 


In recent years a number of user interface design frameworks have emerged 
within the context of visual collections. One such design framework is called 
“rich-prospect browsing.” Described initially by Ruecker et al., rich-prospect in- 
terfaces visualize the entirety of a visual collection first, and then allows users 
to zoom in for more details.“ Additionally, interfaces of this variety typically 
have a suite of features that allow users to navigate the collections in different 


45 Stan Ruecker, Milena Radzikowska, and Stéfan Sinclair, Visual Interface Design for Digital 
Cultural Heritage: A Guide to Rich-Prospect Browsing (Farnham, England: Ashgate, 2011). 
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ways, such as by specific metadata, flexible visual layouts, etc. A nice example 
constitutes the Coins interface, which visualizes the entire coin collection of 
the Miinzkabinett Berlin, one of the world’s largest numismatic Collections "28 

Advances in Web technologies have resulted in the emergence of new inter- 
faces of this variety, but few studies exist to understand their effectiveness and 
their implications for user experience. Therefore, the second exploratory study 
in my project invited 30 participants to the User Lab to test three different rich- 
prospect browsers: Coins, Curator Table, and Museum of the World.” 

The Coins interface represents one type of digital cultural collection, namely 
numismatics. Curator Table, built by Google Arts & Culture, visualizes the collec- 
tions of 600+ partnering institutions in the form of a giant landscape. This collec- 
tion consists primarily of visual art. Finally, the Museum of the World interface 
of the British Museum (in collaboration with Google) provides visitors with a 3D 
geographical timeline of a selection of the museum’s collections, mainly archaeo- 
logical in nature. 

We asked participants in our study to spend ten minutes with each interface 
and report on their experiences through interviews and the think-aloud tech- 
nique (describing the experience of using the interface aloud throughout each 
session). Additionally, we evaluated the user experience of each collection using 
a well-established UX scale called the AttrakDiff, originally developed by Hassen- 
zahl et al.“® The AttrakDiff establishes four empirically measurable elements 
within user experience: pragmatic quality (PQ), hedonic quality — identification 
(HQ-I), hedonic quality - stimulation (HQ-S), and global attractiveness (ATT). 

PQ measures how well a technology allows users to complete a task (e.g. 
search for artwork, compare objects, learn about an artist, etc.). This compo- 
nent represents notions of task-oriented usability more generally. HQ-I meas- 
ures how well the user self-identifies with the technology. In other words, this 
aspect of experience considers the role of self-image and self-expression arising 
as a result of using the technology. For example, a musician may closely iden- 
tify with a specific mobile app for tuning their instrument because its function- 
ality corresponds well with how they structure their practice. HQ-S measures 
the level of stimulation engendered. Is the technology novel and engaging? To 


46 Flavio Gortana et al., “COINS - A Journey Through a Rich Cultural Collection,” Münzkabinett 
Berlin, accessed September 16, 2019, https://uclab.fh-potsdam.de/coins. 

47 Christopher Morse et al., “Art in Rich-Prospect: Evaluating Next-Generation User Interfaces for 
Cultural Heritage,” (Annual Conference of Museums and the Web, Boston, Massachusetts, 2019). 
48 Marc Hassenzahl, Michael Burmester, and Franz Koller, “AttrakDiff: Ein Fragebogen zur Mes- 
sung Wahrgenommener Hedonischer und Pragmatischer Qualitat,” in Mensch & Computer 2003, 
ed. Gerd Szwillus and Jürgen Ziegler, vol. 57 (Wiesbaden: Vieweg+Teubner Verlag, 2003), 187-96. 
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what extent does the technology break conventions? This component considers 
originality and perceived stimulation during use. Finally, ATT measures the 
aesthetics of use and the technology’s perceived value according to users. 

During the interviews and think-aloud sessions we learned that many users 
struggled to understand the context and underlying structure of the interfaces. 
In some cases users wanted more explicit access to search bar functionality, 
which highlighted a lingering reliance on conventional information-seeking be- 
haviors. We also found that users had polarized reactions to the feature that 
visualized cultural collections in their entirety. As one participant commented, 
“Everything is there, but nothing is there.” In other words, by seeing everything 
at once you are too overwhelmed to really see anything at all. Another partici- 
pant described the experience as interesting but impractical. 

Rich-prospect browsing purports to have many distinct advantages over 
other user interface design frameworks. For example, by visualizing a collection 
in its entirety, no single item will be lost within the depths of the digital reposi- 
tory, as so many objects often are. All objects are accessible right from the start. 
However, rich-prospect also presents new challenges. Many people have not 
yet adopted the information seeking behaviors that are necessary to fully en- 
gage with a dynamic digital collection of this kind. Moreover, visualizing the 
entirety of a collection by itself is not enough. Perhaps our most important find- 
ing was that rich-prospect browsing suffers from a lack of context. Users strug- 
gled to understand why certain visualization patterns were used (e.g. is the 
Curator Table visualization supposed to be a map, a landscape, something ran- 
dom?), or what the explicit purpose of the interface was in the first place. Un- 
derstanding the perspective of the user provided us with meaningful insights 
on the technologies and their future development. 


8 A hermeneutics of the user 


Digital hermeneutics concerns itself with criticism, whether that be source criti- 
cism, tool criticism, algorithmic criticism, or any other reflective apparatus. In 
my discussion of the hermeneutics of the user I carefully avoid suggesting the 
term “user criticism.” Arguably, the term “user hermeneutics” (or “a hermeneu- 
tics of the user”) has a different underlying objective. The framework of digital 
hermeneutics endeavors to expose bias in the data, the algorithm, or the tool, 
ultimately informing the researcher how to frame their results as objectively as 
possible. In contrast, a hermeneutics of the user is highly nuanced and deeply 
subjective. We might say instead that it is an empirically based measurement of 
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bias in a particular target individual or group and that, through those biases, 
designers can elevate users’ interactions with technologies. 

We might apply this hermeneutics of the user to more closely understand 
how to design tools for humanities interpretation rather than uncritically borrow- 
ing tools from the natural and social sciences. In Drucker’s article on graphical 
display in the humanities, she argues for an approach to the design of tools that 
are “rooted in a co-dependent relation between observer and experience.” From 
this perspective, temporal realities may warp based on fleeting or long-standing 
emotional states, demographic statistics might be dynamic based on how a per- 
son self-identifies, or a Cartesian map may display in fish-eye lens mode based on 
the subjective reactions of a witness to an external event.°° In human-computer 
interaction, similar initiatives, such as feminist HCI, theorize about how to con- 
struct interactive systems that embody notions of agency, fulfillment, identify and 
the self, equity, empowerment, diversity, and social justice.” 

My doctoral research attempts to draw similar conclusions in the context of 
museum technologies. By understanding different kinds of museum visitors — 
their needs, expectations, and motivations — my project aims to reveal new ave- 
nues for meaningful interaction design in digital spaces. The experience must 
go beyond instrumentality or mere usability, embracing as well the hedonic 
and eudaimonic qualities that factor into our perceptions of technology use. As 
such, user experience design and, by proxy, a hermeneutics of the user contrib- 
ute to the advancement of current trends that are shifting us toward a human- 
centered, holistic approach to the conceptualization of technologies altogether. 


9 Conclusion 


At first glance it may appear that the UX method exists on a different timeline 
than digital hermeneutics. The UX process in its ideal state assumes that a re- 
search tool (or museum user interface) has not yet been developed and that the 
previously discussed five-step iteration will be applied from beginning to end. 
In contrast, digital hermeneutics often concerns itself with what already exists, 
whether that be digitized sources in a virtual gallery or network visualization 
software to make sense of one’s data. However, this assertion could not be 


49 Drucker, “Humanities Approaches.” 

50 Drucker, “Humanities Approaches.” 

51 Shaowen Bardzell, “Feminist HCI: Taking Stock and Outlining an Agenda for Design,” (28th 
International Conference on Human Factors in Computing Systems, Atlanta, GA, 2010), 1301. 
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further from the truth. First, the UX process is an iterative cycle. Designers can 
at any point re-engage their target audience in order to learn how to improve a 
product they have already generated. UX understands that technology is not a 
static thing but, rather, an evolving experience. And while it is true that digital 
hermeneutics as a critical apparatus offers an analytical approach to reflect on 
technologies currently in use, its central commitments of transparency, multi- 
modality, and the non-neutrality of technology are essential to the process of 
visionary design. Both digital hermeneutics and the UX method have forward- 
looking and retrospective potentialities. 

In my own doctoral research, both approaches shed light on the challenges 
of designing technologies that have a practical function (e.g. object search and 
retrieval) as well as a much deeper, experiential quality (e.g. serendipity or 
wonderment). I am now beginning the final phases of the design process in my 
research and evaluating the digital prototypes that emerge as a result. I have 
already witnessed firsthand the importance and relevance of a user hermeneu- 
tics, as it has provided my project with valuable insights about meaning- 
making and information-seeking. As technology increasingly becomes more 
nuanced - not only to our preferences, but to our words, our gestures, even our 
moods - understanding the unlimited diversity of the user will be paramount. 
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